Wednesday, 29 November 2017

SQL Optimization Techniques

Before you start fidgeting with individual SQL statements, it is important to note that hints are probably the last thing you should consider adding when attempting to optimize your code. There are several levels of optimization and it is recommended that you start with the server, then the database instance, and finally go down through the database objects to individual statements. After all, the effect of any changes made to individual statements, particularly hints, may be lost when the database is fine-tuned later on.
As a database developer/architect you may not want to tread the path that leads to the desk of the DBA. Fortunately, there is a bunch of things you can do to improve the runtime performance of your statements:
  • Optimize access structures:
    • Database design and normalization.
    • Tables: heap or index-organized tables, and table or indexed clusters.
    • Indexes.
    • Constraints.
    • Materialized views.
    • Partitioning schemes.
    • Statistics, including a comprehensive refresh strategy.
  • Rewrite SQL statements:
    • Exclude projections that are not required.
    • Minimize the amount of work done more than once.
    • Factor subqueries that are used multiple times in the same statement.
    • Use EXISTS instead of IN because the former stops processing once it has found a match.
    • Use CASE and/or DECODE to avoid having to scan the same rows over and over again, especially for aggregation functions that act on different subsets of the same data.
    • Use analytic functions to do multiple or moving/rolling aggregations with a single pass through the data.
    • Avoid scalar subqueries in the SELECT-list.
    • Use joins instead of subqueries, as it gives the optimizer more room to play around in.
    • Say what you mean and pick the right join: if you only need an inner join don’t write an outer join.
    • Add logically superfluous predicates that may still aid in the search for an optimal execution plan, particularly for outer joins.
    • Avoid implicit conversions of data types, especially in the WHERE clause.
    • Write WHERE clause predicates with a close eye on the indexes available, including the leading edge of a composite index.
    • Avoid, whenever possible, comparison operators such as <>, NOT IN, NOT EXISTS, and LIKE without a leading '%' for indexed columns in predicates.
    • Do not apply functions on indexed columns in the WHERE clause when there is no corresponding function-based index.
    • Don’t abuse HAVING to filter rows before aggregating.
    • Avoid unnecessary sorts, including when UNION ALL rather than UNION is applicable.
    • Avoid DISTINCT unless you have to use it.
    • Use PL/SQL, especially packages with stored procedures (and bind variables) and shared cursors to provide a clean interface through which all data requests are handled.
    • Add hints once you have determined that it is right and necessary to do so.
The advantage of PL/SQL packages to provide all data to users is that there is, when set up properly, exactly one place where a query is written, and that’s the only place where you have to go to to change anything, should you ever wish or need to modify the code. PL/SQL will be in our sights in the next part but suffice to say it is the key to maintainable code on Oracle. Obviously, ad-hoc queries cannot benefit from packages, but at least they profit from having solid access structures, which are of course important to PL/SQL too.

One important thing to keep in mind is that you should always strive to write efficient, legible code, but that premature optimization is not the way to go. Premature optimization involves tinkering with access structures and execution plans; it does not include simplifying, refactoring and rewriting queries in ways that enable Oracle to optimally use the database objects involved.

Rewriting queries with or without hints and studying the corresponding execution plans is tedious and best left for high-impact SQL only: queries that process many rows, have a high number of buffer gets, require many disk reads, consume a lot of memory or CPU time, perform many sorts, and/or are executed frequently. You can identify such queries from the dynamic performance views. Whatever you, the database developer, do, be consistent and document your findings, so that all developers on your team may benefit from your experiences.

Hints in Performance Tuning

Hints are actually comments in SQL statements that are read by the optimizer. They are indicated by the plus sign in /*+ SOME_HINT */.

Types of Hints

  • Optimization goals and approaches
  • Access path hints;
  • In-memory column store hints;
  • Join order hints;
  • Join operation hints;
  • Parallel execution hints;
  • Online application upgrade hints;
  • Query tranformation hints;
  • XML hints;
  • Other hints.
In Oracle Database 12c Performance Tuning Recipes, the authors provide two additional types of hints:
  • Data warehousing hints;
  • Optimizer hints
Optimization goals and approaches

Oracle only lists two hints in this category: ALL_ROWS and FIRST_ROWS( number_of_rows ).

Note
Oracle ignores FIRST_ROWS in all DELETE and UPDATE statements and in SELECT statement blocks that include sorts and/or groupings, as it needs to fetch all relevant data anyway.
  
Optimizer Hints

GATHER_PLAN_STATISTICS hint, which can be used to obtain statistics about the execution plan during the execution of a statement. It is especially helpful when you intend to diagnose performance issues with a particular statement

GATHER_OPTIMIZER_STATISTICS, which Oracle lists under ‘Other hints’. It can be used to collect bulk-load statistics for CTAS statements and INSERT INTO ... SELECT statements that use a direct-path insert, which is accomplished with the APPEND hint, but more on that later.
 
Access Path Hints

FULL( tab_name ) hint. It instructs the optimizer to access a table by means of a full table scan. If the table you want Oracle to access with a full table scan has an alias in the SQL statement, you have to use the alias rather than the table name (without the schema name) as the parameter to FULL

  • INDEX / NO_INDEX
  • INDEX_ASC / INDEX_DESC
  • INDEX_FFS / NO_INDEX_FFS
  • INDEX_SS / NO_INDEX_SS
  • INDEX_SS_ASC / INDEX_SS_DESC
  • INDEX_COMBINE / INDEX_JOIN
All these hints take at least one parameter: the table name or alias in the SQL statement. A second parameter, the index name(s), is optional but often provided

Join Order Hints

ORDERED takes no parameters and instructs the optimizer to join the tables in the order as they appear in the FROM clause. Because the ORDERED hint is so basic and you do not want to move around tables in the FROM clause

LEADING hint. It takes the table names or aliases (if specified) as parameters, separated by spaces.

ORDERED beats LEADING when both are specified for the same statement

Join Operation Hints
  • USE_HASH / NO_USE_HASH
  • USE_MERGE / NO_USE_MERGE
  • USE_NL / NO_USE_NL
Parallel Execution Hints

All DML statements, including subqueries, can be run in parallel, which means that multiple blocks can be selected, inserted, deleted, or updated simultaneously. For parallelized DDL statements, multiple blocks are being created/altered and written in parallel. The DDL statements that can be run in parallel are:
  • CREATE INDEX
  • CREATE TABLE ... AS SELECT
  • ALTER INDEX ... REBUILD
  • ALTER INDEX ... [ REBUILD | SPLIT ] PARTITION
  • ALTER TABLE ... MOVE
  • ALTER TABLE ... [ MOVE | SPLIT | COALESCE ] PARTITION 
Query Transformation Hints
  • FACT / NO_FACT
  • MERGE / NO_MERGE
  • NO_EXPAND / USE_CONCAT
  • REWRITE / NO_REWRITE
  • UNNEST / NO_UNNEST
  • STAR_TRANSFORMATION / NO_STAR_TRANSFORMATION

Data Access methods in Explain plan

An access method  or access path shows the way the data will be accesses from each table/index. This can be seen in the operation column tab in EXPLAIN PLAN.

Oracle Supports the below access methods.
  1. Full Table SCAN (FTS)
  2. Table Access by ROW-ID
  3. Index Unique Scan
  4. Index Range Scan
  5. Full Index Scan
  6. Fast Full Index Scans
  7. Index Skip Scan
  8. Index Joins
  9. Hash Access
  10. Cluster Access
  11. Bit Map Index

 Full Table SCAN (FTS)



In a FTS operation, the whole table is read up to the high water mark (HWM)

The HWM marks the last block in the table that has ever had data written to it. If you have deleted all the rows then you will still read up to the HWM. Truncate resets the HWM back to the start of the table. FTS uses multi-block i/o to read the blocks from disk.

Buffers from FTS operations are placed on the Least Recently Used (LRU) end of the buffer cache so will be quickly aged out. 

In most cases FTS is not recommended for large tables unless you are reading >5-10% of it (or so) or you intend to run in parallel simply because there is likely to be a more economical way of accessing the data.

A multi-block read can be performed to speed up the table scan: several adjacent blocks are combined into a single I/O call. 

The DB_FILE_MULTIBLOCK_READ_COUNT parameter in the init.ora file defines the number of blocks that can be lumped into one multi-block read.


A full table scan is typically employed if
  • no index exists;
  • the index cannot be used because the query predicate has a function applied to an indexed column (in a non-function-based index);
  • the optimizer decides against an index skip scan because the query predicate does not include the leading edge of a (B-tree) index;
  • SELECT COUNT(*) is issued and the index contains nulls;
  • the table statistics are stale and the table has grown considerably since the statistics were last computed;
  • the query is not selective, so that a large portion of the rows must be accessed;
  • the cost of a full table scan is the lowest because the table is small, in particular the number of formatted blocks under the high water mark is smaller than DB_FILE_MULTIBLOCK_READ_COUNT;
  • the table has a high degree of parallelism, which makes the optimizer biased towards full table scans;
  • the query uses the FULL hint.

SQL> set autotrace explain

SQL> select * from dual;
Execution Plan
----------------------------------------------------------
Plan hash value: 3543395131

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |     2 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| DUAL |     1 |     2 |     2   (0)| 00:00:01 |

 

 Table Access by ROW-ID


Rowid of a row specifies datafile,data block within the file and location of row within that block.
Oracle initially obtains the rowid’s either from a where clause predicate or through an index scan of one or more of table’s indexes.Once these  indexes are obtained,the required rows are selected based on the rowids and does a row by row access.

When the Optimizer Uses Rowids:
 
This is generally the second step after retrieving the rowid from an index. The table access might be required for any columns in the statement not present in the index.
Access by rowid does not need to follow every index scan. If the index contains all the columns needed for the statement, then table access by rowid might not occur



Data is accessed by looking up key values in an index and returning rowids. A rowid uniquely identifies an individual row in a particular data block. This block is read via single block i/o.
In this example an index is used to find the relevant row(s) and then the table is accessed to look up the ename column (which is not included in the index).

If the execution plan shows a line TABLE ACCESS BY INDEX ROWID BATCHED it means that Oracle retrieves a bunch of ROWIDs from the index and then tries to access rows in block order to reduce the number of times each block needs to be accessed.


SQL> select empno,ename from emp where empno=10;

Execution Plan
----------------------------------------------------------
Plan hash value: 4066871323

--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |     1 |    10 |     1   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| EMP    |     1 |    10 |     1   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | PK_EMP |     1 |       |     0   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("EMPNO"=10)

  
Notice the ‘TABLE ACCESS BY ROWID’ section. This indicates that the table data is not being accessed via a FTS operation but rather by a rowid lookup. 

In this case the rowid has been produced by looking up values in the index first. The index name in this case is PK_EMP and the index is being accessed by an ‘INDEX UNIQUE SCAN’ operation.

If all the required data resides in the index then a table lookup may be unnecessary and all you will see is an index access with no table access.

Index Unique Scan:

As the name says, it returns at most, a single rowid only.Oracle performs a unique scan if a statement contains a UNIQUE or a PRIMARY KEY constraint that guarantees that only a single row is accessed.

For an index unique scan only one row of a (B-tree) index or index-organized table will be returned because of an equality predicate on a unique index or a primary key constraint.
The database stops looking for more rows once it has found its match because there cannot be any more matches thanks to the UNIQUE constraint.


SQL> select empno from emp where empno=10;
Execution Plan
----------------------------------------------------------
Plan hash value: 4008335093
----------------------------------------------------------------------------
| Id  | Operation         | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |        |     1 |     4 |     0   (0)| 00:00:01 |
|*  1 |  INDEX UNIQUE SCAN| PK_EMP |     1 |     4 |     0   (0)| 00:00:01 |
----------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("EMPNO"=10)

If all the required data resides in the index then a table lookup may be unnecessary and all you will see is an index access with no table access. 
In the above example all the columns (empno) are in the index. Notice that no table access takes place
 
 
 Index Range Scan:

Index range scan is a method for accessing a range values of a particular column. AT LEAST the leading column of the index must be supplied to access data via the index. It can be used for range operations (e.g. > < >= <= between  ) or where the data to be returned is not unique.  

n.b: A non-unique index may return multiple values for the predicate mgr = 5 and will use an index range scan.
 
An index range scan is a common operation for accessing selective data. It can be bounded (bounded on both sides) or unbounded (on one or both sides). Data is returned in the ascending order of index columns. Multiple rows with identical values are sorted in ascending order by rowid.

Index Range scan descending: It’s conceptually the same as an index range scan but used when it’s needed a situation alike of .. “order by descending”.

When the Optimizer Uses Index range scan :

The optimizer uses a range scan when it finds one or more leading columns of an index specified in conditions, such as the following:
  1.     col1 = :b1
  2.     col1 < :b1
  3.     col1 > :b1
AND combination of the preceding conditions for leading columns in the index col1 like ‘ASD%’ wild-card searches should not be in a leading position otherwise the condition col1 like ‘%ASD’ does not result in a range scan.

Range scans can use unique or nonunique indexes. Range scans avoid sorting when index columns constitute the ORDER BY/GROUP BY clause.

select empno,ename from emp where empno > 7876 order by empno;

Execution Plan
----------------------------------------------------------
Plan hash value: 2449469783
--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |     1 |    10 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| EMP    |     1 |    10 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | PK_EMP |     1 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("EMPNO">7876)

n.b: A non-unique index may return multiple values for the predicate mgr = 5 and will use an index range scan.

SQL> create index emp_mgr on emp(mgr);

SQL> select mgr from emp where mgr = 5;

Execution Plan
----------------------------------------------------------
Plan hash value: 1542557660
----------------------------------------------------------------------------
| Id  | Operation        | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT |         |     1 |     4 |     1   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| EMP_MGR |     1 |     4 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("MGR"=5)

  

Index Full Scan


A full index scan does not read every block in the index though. Instead it reads the root block and goes down the left-hand side (ascending scan) or right-hand side (descending scan) of the branch blocks until it reaches a leaf block. From there it reads the index on a block-by-block basis

The index blocks are scanned one by one, not using multi-block I/O (like a FTS or Index FFS). 

In the following example, Index E_CIX is a concatenated index on emp (empno,ename). A select with no predicates results in an index full scan since it can satisfy the whole query without need to visit the table


 
SQL> create index E_CIX on emp (empno,ename);

Index created.

SQL> select empno,ename from emp order by empno,ename;

Execution Plan
----------------------------------------------------------
Plan hash value: 2418964722

--------------------------------------------------------------------------
| Id  | Operation        | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT |       |    14 |   140 |     1   (0)| 00:00:01 |
|   1 |  INDEX FULL SCAN | E_CIX |    14 |   140 |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------

Index Full Scan can also be used to access second column of concatenated indexes because the whole index is being retrieved as compared to a range scan which may not retrieve all the blocks

SQL> select ename from emp;

Execution Plan
----------------------------------------------------------
Plan hash value: 2418964722

--------------------------------------------------------------------------
| Id  | Operation        | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT |       |    14 |    84 |     1   (0)| 00:00:01 |
|   1 |  INDEX FULL SCAN | E_CIX |    14 |    84 |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------

The full index scan is used when one of the following situations arises:
  • a predicate references a (non-leading) column in an index;
  • no predicate is specified but all columns in the table as well as query are in the index, and at least one indexed column is not null;
  • the query has an ORDER BY clause on indexed non-null columns; means order of the columns referenced in the query matches the order of the leading index columns
  • the query has a GROUP BY clause where all aggregation columns are present in the index. means A group by clause is present in the query and the columns in the group by clause are present in the index

Fast Full Index Scans


Fast full index scans are an alternative to a full table scan when the index contains all the columns that are needed for the query, and at least one column in the index key has the NOT NULL constraint.

A fast full scan accesses the data in the index itself, without accessing the table.

It cannot be used to eliminate a sort operation, because the data is not ordered by the index key.

It reads the entire index using multiblock reads, unlike a full index scan, and can be parallelized.

Fast full index scans cannot be performed against bitmap indexes.

A fast full index scan reads index blocks as they exist on disk. The index (entries on the leaf blocks) rather than the table is read in multi-block I/O mode.



SQL> select /*+ Index_FFS(emp) */ empno,ename from emp;

Execution Plan
----------------------------------------------------------
Plan hash value: 2100043038
------------------------------------------------------------------------------
| Id  | Operation            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |       |    14 |   140 |     2   (0)| 00:00:01 |
|   1 |  INDEX FAST FULL SCAN| E_CIX |    14 |   140 |     2   (0)| 00:00:01 |
------------------------------------------------------------------------------

Index Skip Scan:  
 
If all the other columns in an index are referenced in the query except the first (leading one)…. Oracle prefers index skip scan.
The index skip scan is also an option if the leading column has few distinct values.


SQL> create index E_CIX on emp (empno,ename);

Index created.


SQL>  select /*+ index_ss(emp i_emp)*/ job from emp where ename='SMITH';

Execution Plan
----------------------------------------------------------
Plan hash value: 112616935
---------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |

---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |     1 |    14 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| EMP   |     1 |    14 |     2   (0)| 00:00:01 |
|*  2 |   INDEX SKIP SCAN                   | E_CIX |     1 |       |     1   (0)| 00:00:01 |

---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ENAME"='SMITH')

filter("ENAME"='SMITH')

Index Joins:
 
It’s a join of several indexes on the same table that collectively contain all of the columns that are referenced in the query from that of the table. Whenever Index Join is used no table access is required as all the relevant columns are retrieved from the joined indexes.

This can never be used to eliminate a sort operation.

Hash Access:

A hash scan is used to locate rows in a hash cluster, based on a hash value. In a hash cluster, all rows with the same hash value are stored in the same data block.
To perform a hash scan, Oracle first obtains the hash value by applying a hash function to a cluster key value specified by the statement.
Oracle then scans the data blocks containing rows with that hash value.

Cluster Access:
 
A cluster scan is used to retrieve, from a table stored in an indexed cluster, all rows that have the same cluster key value. In an indexed cluster, all rows with the same cluster key value are stored in the same data block.
To perform a cluster scan, Oracle first obtains the rowid of one of the selected rows by scanning the cluster index. Oracle then locates the rows based on this rowid.

Bit Map Index access:

If the predicate on a bitmap-indexed column contains an equality operator, the query optimizer considers the bitmap index single value access path to look up a single key value. A single bitmap is scanned for all positions containing a value of 1. All matching values are converted into ROWIDs, which in turn are used to find the corresponding rows
A bitmap merge is typically preferred by the optimizer when bitmaps generated from a bitmap index range scan are combined with an OR operation between two bitmaps