Wednesday, 29 November 2017

Data load using SQL*Loader



SQL*Loader is the primary method for quickly populating Oracle tables with data from external files. It has a powerful data parsing engine that puts little limitation on the format of the data in the data file. SQL*Loader is invoked when you specify the sqlldr command or use the Enterprise Manager interface.
SQL*Loader is an integral feature of Oracle databases and is available in all configurations.

Key Features

SQL*Loader can be used to do the following:

  • ·       Load data across a network. This means that a SQL*Loader client can be run on a different system from the one that is running the SQL*Loader server.
  • ·       Load data from multiple data files during the same load session
  • ·       Load data into multiple tables during the same load session
  • ·       Specify the character set of the data
  • ·       Selectively load data
  • ·       Load data from disk, tape, or named pipe
  • ·       Specify the character set of the data
  • ·       Generate sophisticated error reports, which greatly aid troubleshooting
  • ·       Load arbitrarily complex object-relational data
  • ·       Use either conventional or direct path loading.


File Types

·       SQL*Loader Control File

The control file is a text file written in a language that SQL*Loader understands. The control file tells SQL*Loader where to find the data, how to parse and interpret the data, and where to insert the data.
·       Input Data and Datafiles

SQL*Loader reads data from one or more files specified in the control file. From SQL*Loader's perspective, the data in the data file is organized as records. A particular data file can be in fixed record format, variable record format, or stream record format. The chosen format depends on the data and depends on the flexibility and performance necessary for the job.

LOBFILEs
LOB data can be lengthy enough that it makes sense to load it from a LOBFILE. LOB data instances are still considered to be in fields, but these fields are not organized into records. Therefore, the processing overhead of dealing with records is avoided. This type of or organization of data is ideal for LOB loading.

Bulk Loads
You can use SQL*Loader to bulk load objects, collections, and LOBs. SQL*Loader supports the following bulk loads:

·       Two object types: column objects and row objects.
·       Load data from multiple data files during the same load session.
·       Load data into multiple tables during the same load session.
·       Two collection types: nested tables and VARRAYS.

·       Four LOB types: BLOBs, CLOBs, NCLOBs, and BFILEs.

Load Methods
SQL*Loader provides three methods to load data: Conventional Path, Direct Path, and External Table.

·       Conventional Path Load
Conventional path load builds an array of rows to be inserted and uses the SQL INSERT statement to load the data. During conventional path loads, input records are parsed according to the field specifications, and each data field is copied to its corresponding bind array. When the bind array is full (or no more data is left to read), an array insert is executed.

·       Direct Path Load
A direct path load builds blocks of data in memory and saves these blocks directly into the extents allocated for the table being loaded. A direct path load uses the field specifications to build whole Oracle blocks of data, and write the blocks directly to Oracle data files, bypassing much of the data processing that normally takes place. Direct path load is much faster than conventional load, but entails some restrictions.
A parallel direct path load allows multiple direct path load sessions to concurrently load the same data segments. Parallel direct path is more restrictive than direct path.

·       External Table Load
An external table load creates an external table for data in a data file and executes INSERT statements to insert the data from the data file into the target table.
There are two advantages of using external table loads over conventional path and direct path loads:
·       An external table load attempts to load data files in parallel. If a data file is big enough, it will attempt to load that file in parallel.
·       An external table load allows modification of the data being loaded by using SQL functions and PL/SQL functions as part of the INSERT statement that is used to create the external table.

Summary

SQL*Loader is a high-speed data loading utility that loads data from external files into tables in an Oracle database. It provides database administrators with the fast performance and flexibility required to get load jobs conducted as quickly and efficiently as possible

sqlldr username/password@server control=loader.ctl

v  This sample control file (loader.ctl) will load an external data file containing delimited data:

load data
 infile 'c:\data\mydata.csv'
 into table emp
 fields terminated by "," optionally enclosed by '"'                             
 ( empno, empname, sal, deptno )

$ vi emp.ctl

LOAD DATA
INFILE   ‘/u01/oracle/emp.csv’
BADFILE              ‘/u01/oracle/emp.bad’
DISCARDFILE     ‘/u01/oracle/emp.dsc’
INSERT INTO TABLE emp
FIELDS TERMINATED BY “,” OPTIONALLY ENCLOSED BY ‘”’ TRAILING NULLCOLS
(empno,name,sal,jdate date ‘mm/dd/yyyy’)

Ø  The LOAD DATA statement is required at the beginning of the control file
Ø  The INFILE option specifies where the input file is located
Ø  Specifying BADFILE is optional. If you specify,  then bad records found during loading will be stored in this file
Ø  Specifying DISCARDFILE is optional. If you specify, then records which do not meet a WHEN condition will be written to this file.
Ø  You can use any of the following loading option
         i.            INSERT : Loads rows only if the target table is empty
       ii.            APPEND: Load rows if the target table is empty or not.
     iii.            REPLACE: First deletes all the rows in the existing table and then, load rows.
     iv.            TRUNCATE: First truncates the table and then load rows.
Ø  This line indicates how the fields are separated in input file. Since in our case the fields are separated by “,” so we have specified “,” as the terminating char for fields. You can replace this by any char which is used to terminate fields. Some of the popularly use terminating characters are semicolon “;”, colon “:”, pipe “|” etc. TRAILING NULLCOLS means if the last column is null then treat this as null value, otherwise, SQL LOADER will treat the record as bad if the last column is null.

v  You can use SQL*Plus to select and format your data and then spool it to a file.
This example spools out a CSV (comma separated values) file that can be imported into MS-Excel:

set echo off newpage 0 space 0 pagesize 0 feed off head off trimspool on
spool oradata.txt
select col1 || ',' || col2 || ',' || col3
  from tab1
 where col2 = 'XYZ';
spool off
v   
You can also use the "set colsep" command if you don't want to put the commas in by hand.
Example:

set colsep ','
set echo off newpage 0 space 0 pagesize 0 feed off head off trimspool on
spool oradata.txt
select col1, col2, col3
  from tab1
 where col2 = 'XYZ';
spool off

v  PL/SQL's UTL_FILE package can also be used to unload data. 

Example:
declare
  fp utl_file.file_type;
begin
  fp := utl_file.fopen('c:\oradata','tab1.txt','w');
  utl_file.putf(fp, '%s, %sn', 'TextField', 55);
  utl_file.fclose(fp);
end;
/
  Data can be modified as it loads into the Oracle Database. One can also populate columns with static or derived values.
  However, this only applies for the conventional load path (and not for direct path loads)

LOAD DATA
  INFILE *
  INTO TABLE modified_data
  (  rec_no                      "my_db_sequence.nextval",
     region                      CONSTANT '31',
     time_loaded                 "to_char(SYSDATE, 'HH24:MI')",
     data1        POSITION(1:5)  ":data1/100",
     data2        POSITION(6:15) "upper(:data2)",
     data3        POSITION(16:22)"to_date(:data3, 'YYMMDD')"
  )

BEGINDATA
11111AAAAAAAAAA991201
22222BBBBBBBBBB990112

LOAD DATA
  INFILE 'mail_orders.txt'
  BADFILE 'bad_orders.txt'
  APPEND
  INTO TABLE mailing_list
  FIELDS TERMINATED BY ","
  (  addr,
     city,
     state,
     zipcode,
     mailing_addr   "decode(:mailing_addr, null, :addr, :mailing_addr)",
     mailing_city   "decode(:mailing_city, null, :city, :mailing_city)",
     mailing_state,
     move_date      "substr(:move_date, 3, 2) || substr(:move_date, 7, 2)"
  )

Loading from multiple input files.

One can load from multiple input files provided they use the same record format by repeating the INFILE clause.

LOAD DATA
  INFILE file1.dat
  INFILE file2.dat
  INFILE file3.dat
 
APPEND
  INTO TABLE emp
  ( empno  POSITION(1:4)   INTEGER EXTERNAL,
    ename  POSITION(6:15)  CHAR,
    deptno POSITION(17:18) CHAR,
    mgr    POSITION(20:23) INTEGER EXTERNAL
  )

Loading into multiple tables 

One can also specify multiple "INTO TABLE" clauses in the SQL*Loader control file to load into multiple tables

LOAD DATA
 INFILE *
 INTO TABLE tab1 WHEN tab = 'tab1'
   ( tab  FILLER CHAR(4),
     col1 INTEGER
   )
 INTO TABLE tab2 WHEN tab = 'tab2'
   ( tab  FILLER POSITION(1:4),
     col1 INTEGER
   )

BEGINDATA
tab1|1
tab1|2
tab2|2
tab3|3

Can one selectively load only the records that one needs?
 
Look at this example, (01) is the first character, (30:37) are characters 30 to 37:

LOAD DATA
  INFILE  'mydata.dat' BADFILE  'mydata.bad' DISCARDFILE 'mydata.dis'
  APPEND
  INTO TABLE my_selective_table
  WHEN (01) <> 'H' and (01) <> 'T' and (30:37) = '20031217'
  (
     region              CONSTANT '31',
     service_key         POSITION(01:11)   INTEGER EXTERNAL,
     call_b_no           POSITION(12:29)   CHAR
  )

NOTE: SQL*Loader does not allow the use of OR in the WHEN clause. You can only use AND as in the example above!
To workaround this problem, code multiple "INTO TABLE ... WHEN" clauses. Here is an example:

LOAD DATA
  INFILE  'mydata.dat' BADFILE  'mydata.bad' DISCARDFILE 'mydata.dis'
  APPEND

  INTO TABLE my_selective_table
  WHEN (01) <> 'H' and (01) <> 'T'
  (
     region              CONSTANT '31',
     service_key         POSITION(01:11)   INTEGER EXTERNAL,
     call_b_no           POSITION(12:29)   CHAR
  )
  INTO TABLE my_selective_table
  WHEN (30:37) = '20031217'
  (
     region              CONSTANT '31',
     service_key         POSITION(01:11)   INTEGER EXTERNAL,
     call_b_no           POSITION(12:29)   CHAR
  )

Can one skip certain columns while loading data?
 
One cannot use POSITION(x:y) with delimited data. Luckily, from Oracle 8i one can specify FILLER columns. FILLER columns are used to skip columns/fields in the load file, ignoring fields that one does not want.

LOAD DATA
TRUNCATE INTO TABLE T1
FIELDS TERMINATED BY ','
( field1,
  field2 FILLER,
  field3
)

BOUNDFILLER (available with Oracle 9i and above) can be used if the skipped column's value will be required later again

LOAD DATA
INFILE *
TRUNCATE INTO TABLE sometable
FIELDS TERMINATED BY ","  trailing nullcols
(
 c1,
 field2 BOUNDFILLER,
 field3 BOUNDFILLER,
 field4 BOUNDFILLER,
 field5 BOUNDFILLER,
 c2     ":field2 || :field3",
 c3     ":field4 + :field5"
)

 Conventional Path Load and Direct Path Load
 
Conventional Path

Conventional path load (the default) uses the SQL INSERT statement and a bind array buffer to load data into database tables.

When SQL*Loader performs a conventional path load, it competes equally with all other processes for buffer resources. This can slow the load significantly. Extra overhead is added as SQL statements are generated, passed to Oracle, and executed.

The Oracle database looks for partially filled blocks and attempts to fill them on each insert. Although appropriate during normal use, this can slow bulk loads dramatically.

Direct Path

In Direct Path Loading, Oracle will not use SQL INSERT statement for loading rows. Instead it directly writes the rows, into fresh blocks beyond High Water Mark; in data files i.e. it does not scan for free blocks before high water mark. Direct Path load is very fast because

o   Partial blocks are not used, so no reads are needed to find them, and fewer writes are performed.
o   SQL*Loader need not execute any SQL INSERT statements; therefore, the processing load on the Oracle database is reduced.
o   A direct path load calls on Oracle to lock tables and indexes at the start of the load and releases them when the load is finished. A conventional path load calls Oracle once for each array of rows to process a SQL INSERT statement.
o   A direct path load uses multi block asynchronous I/O for writes to the database files.
o   During a direct path load, processes perform their own write I/O, instead of using Oracle's buffer cache. This minimizes contention with other Oracle users.
o   Restrictions on Using Direct Path Loads

The following conditions must be satisfied for you to use the direct path load method:

o   Tables are not clustered.
o   Tables to be loaded do not have any active transactions pending.
o   Loading a parent table together with a child Table
o   Loading BFILE columns

No comments: