Saturday, October 10, 2009

Enabling Your Database to perform hash joins

In cases where a very small table is being joined to a large table, the Oracle hash join will often dramatically speed-up the query. Hash joins are far faster than nested loop joins in some cases, but it is not easy to get your database to use hash joins.



Unfortunately, the hash join is more memory intensive than a nested loop join. To be faster than a nested loop join, we must set the hash_area_size large enough to hold the entire hash table in memory (about 1.6 times the sum of the rows in the table). If the hash join overflows the hash_area_size memory, the hash join will page into the TEMP tablespace, severely degrading the performance of the hash join. You can use the following script, hash_area.sql, to dynamically allocate the proper hash_area_size for your SQL query in terms of the size of your target table.



hash_area.sql
set heading off;
set feedback off;
set verify off;
set pages 999;

spool run_hash.sql

select
'alter session set hash_area_size='||trunc(sum(bytes)*1.6)||';'
from
dba_segments
where
segment_name = upper('&1');

spool off;

@run_hash


Here is the output from this script. As you see, we pass the driving table name, and the script generates the appropriate alter session command to ensure that we have enough space in hash_area_size RAM to hold the driving table.



SQL> @hash_area customer

alter session set hash_area_size=3774873;

In addition to seeing the hash_area_size, we must also be able to adjust the degree of parallelism in cases where we use a full-table scan to access the tables in a hash join.



This short tip does not have complete information about the hash join, but you can get complete information from Oracle High Performance SQL Tuning by Oracle Press. The hash join is very finicky, and there are many conditions that must be satisfied. It is not uncommon to find that a use_hash hint is ignored, and here are some common causes of this problem.



Check initialization parameters Make sure that you have the proper settings for optimizer_index_cost_adj and optimizer_max_permutations to limit the number of table join evaluations. Also check your values for hash_area_size and hash_multiblock_io_count.


Verify the driving Table Make sure that the smaller table is the driving table (the first table in the from clause when using the ordered hint). This is because a hash join builds the memory array using the driving table.


Analyze CBO statistics Check that tables and/or columns of the join tables are appropriately analyzed.


Check for skewed columns Histograms are recommended only for nonuniform column distributions. If necessary, you can override the join order chosen by the cost-based optimizer using the ordered hint.


Check RAM region Ensure that hash_area_size is large enough to hold the smaller table in memory. Otherwise, Oracle must write to the TEMP tablespace, slowing down the hash join. Oracle recommends that the hash_area_size for the driving table be set at 1.6 times the sum of bytes for the driving table, and you can use the hash_area.sql script to set the hash_area_size for your query session.


If you have followed these steps, you should be able to add a use_hash hint ton your SQL to invoke a hash join. Remember, it is always a good idea to run a explain plan on your SQL to ensure that the hash join is being invoked.

No comments:

Post a Comment