Monday, October 5, 2009

Running hash joins in parallel

Oracle hash joins are notoriously hard to set up within Oracle, and the Oracle Metalink archives are full of problem reports regarding successful hash join invocation. I will discuss hash joins in detail in Chapter 16, but for now, just be aware that Oracle reads the driving table into a RAM array of hash_area_size and uses a special hashing technique to join the memory array with the larger table.



For equi-join operations, hash joins can outperform nested loop joins, especially in cases where the driving table is small enough to fit entirely into the hash_area_size. If the driving table is too large, the hash join will write temporary segments into the TEMP tablespace, slowing down the query. Since the reading of the table rows for a hash join is the most time-consuming operation in a hash join, setting parallelism on the table can dramatically improve the performance and throughput of the query.



Here is an example of a query that forces a parallel hash join. Note that the emp table is set as the driving table:



select /*+ use_hash(e,b) parallel(e, 4) parallel(b, 4) */
e.ename,
hiredate,
b.comm
from
emp e,
bonus b
where
e.ename = b.ename
;

Here is the execution plan for the hash join. Note that both tables in this join are using parallel query to obtain their rows:

OPERATION
----------------------------------------------------------------------
OPTIONS OBJECT_NAME POSITION
------------------------------ ---------------------------- ----------
OTHER_TAG
----------------------------------------------------------------------
SELECT STATEMENT
3
HASH JOIN
1
PARALLEL_TO_SERIAL
TABLE ACCESS
FULL EMP 1
PARALLEL_TO_PARALLEL
TABLE ACCESS
FULL BONUS 2


For equi-join SQL, hash joins are often faster than nested loop joins, especially in cases where the driving table is filtered into a small number of rows in the query’s where clause.

No comments:

Post a Comment