Data and Database: Joins in Explain Plans

There are 3 join methods used by the Cost-Based Optimizer (CBO):

Nested Loops
Merge Join
Hash Join

Nested Loops

A nested loop takes a row from the outer table and processes the inner table once.

The outer table should be significantly larger (row count by the where clause predicate) than the inner table. In fact, the fewer rows the inner table works with, the faster a nested loop will perform.

Notice in this example that the inner table is using a UNIQUE index.

The inner table processes 1 row for each row returned from the outer table. This is a pretty tight loop!

***Note*** whenever you see the word unique in an explain plan, it means 1 row.

The Use_NL hint will tell a query to use nested loops.

You can see the order of the explain plan processing clearly using TOAD. Step 2 is the outer table. Step 3 is the inner table on this join condition. These 2 steps produce a row set that becomes the outer table to step 6. Step 5 is the inner table to this nested loop.

Also notice that step 2 has a cardinality of 1,000 and step 3 has a cardinality of 10…larger to smaller. You can see the CBO estimated size of the row set at step 6 of 10,000 rows. This is the outer table (the row set from the first nested loop) and is being compared to step 5 with a row set of 10. Again…larger to smaller.

What Oracle does with two nested loops is basically the following:

Iterate through the inner nested loop to obtain the requested outer source rows.
Cache the data in the PGA.
Find the matching ROWID from the inner loop’s inner row source.
Cache the data in the PGA.
Organize the ROWIDs for more efficient access in the cache.
Iterate through the outer loop to retrieve the data based on the cached ROWIDs; the result set of the inner loop becomes the outer row source of the outer nested loop

Hash Join

This join condition also has the outer and inner table concept.

A hash table is a memory structure where some math is applied to the key value and this produces a unique location in the table.
Oracle hashes lots of things, for example, the library cache now is a great big hash table where it uses a check sum on the SQL text as the hash key value…

This join method uses these hash tables to produce its row sets.

The outer table is read while applying any access predicates (where clause join conditions), its key value is the where clause predicate…is hashed and loaded into this structure.

The inner table is then read, its rows hashed by same key value one at a time and looked (probed into) the hashed memory structure for a match.

IF there is a match, the row is moved to the row set. If no match, it is discarded.

You can see in this illustration that the outer table is the COMPANIES table and the inner table is the CONTACTS table.

This join condition works best if the outer table is the smaller of the 2 tables, size determined by the where clause selectivity.

SO, where the nested loop goes larger to smaller, this join condition goes smaller to larger.

Notice the smaller to larger nature of the hash join. Step 3 is the outer table with a row count of 100 and is being joined to step 4 (inner table) with a row count of 1,000. smaller to larger. It comes out of this join with an estimated row count of 1,000 (visible at step 5) and compares that to the 1,000 rows at step 1

Merge Join

The merge join also has the concept of the outer table and the inner table but in this join, it really doesn’t matter which table goes first.
Oracle will select rows based on the join condition of both the outer and the inner tables. It will sort the result then like fanning your fingers together, will put the rows, in order, into the row set passed to the next explain plan step.
This feature is seldom seen unless a lot of data is being accessed from both tables.

The outer table is the CONTACTS table and the inner table is the COMPANIES.
There is a hint that will force this join condition upon a SQL, the USE_MERGE hint.

Table b is the outer table and is being joined (merged with) at line 2 with table A at line 4. This join will then produce a row set that is passed to step 9 as the outer table and merged with the C table seen at step 7.

Which join condition is best?

Oracle will select these based on perceived row counts of each side of the join condition.

First, hash joins will only happen on equality conditions, i.e: ‘=’ conditions. It cannot look up anything but an equal condition in its hash tables.

SO most between, greater than, etc, will use the nested loops.

Oracle will use a merge join if both sides of the join condition is perceived to produce a lot of rows.
Oracle will tend to use a nested loop when the rows from one table is perceived to be a large difference from the other, and the other is a very small amount of rows
Oracle will tend to use a hash join when there are not a god-zillion rows involved, there are equal conditions and one perceived result from one table is larger than the other.

Data and Database

Monday, 6 November 2017

Joins in Explain Plans

Nested Loops

No comments:

Total Pageviews

Labels