SQL*Loader is the primary method for quickly populating
Oracle tables with data from external files. It has a powerful data parsing
engine that puts little limitation on the format of the data in the data file.
SQL*Loader is invoked when you specify the sqlldr command or use the Enterprise
Manager interface.
SQL*Loader is an integral feature of Oracle databases and
is available in all configurations.
Key Features
SQL*Loader can be
used to do the following:
- · Load data across a network. This means that a SQL*Loader client can be run on a different system from the one that is running the SQL*Loader server.
- · Load data from multiple data files during the same load session
- · Load data into multiple tables during the same load session
- · Specify the character set of the data
- · Selectively load data
- · Load data from disk, tape, or named pipe
- · Specify the character set of the data
- · Generate sophisticated error reports, which greatly aid troubleshooting
- · Load arbitrarily complex object-relational data
- · Use either conventional or direct path loading.
File Types
·
SQL*Loader Control
File
The control file is a text file written in a language
that SQL*Loader understands. The control file tells SQL*Loader where to find
the data, how to parse and interpret the data, and where to insert the data.
·
Input
Data and Datafiles
SQL*Loader reads data from one or more files
specified in the control file. From SQL*Loader's perspective, the data in the data
file is organized as records. A particular data file can be in fixed record
format, variable record format, or stream record format. The chosen format
depends on the data and depends on the flexibility and performance necessary
for the job.
LOBFILEs
LOB data can be lengthy enough that it makes sense to
load it from a LOBFILE. LOB data instances are still considered to be in
fields, but these fields are not organized into records. Therefore, the
processing overhead of dealing with records is avoided. This type of or
organization of data is ideal for LOB loading.
Bulk Loads
You can use SQL*Loader to bulk load objects, collections,
and LOBs. SQL*Loader supports the following bulk loads:
·
Two
object types: column objects and row objects.
·
Load data from multiple data files during the
same load session.
·
Load data into multiple tables during the same
load session.
·
Two
collection types: nested tables and VARRAYS.
·
Four LOB types: BLOBs, CLOBs, NCLOBs, and
BFILEs.
Load Methods
SQL*Loader provides three methods to load data: Conventional
Path, Direct Path, and External Table.
·
Conventional
Path Load
Conventional path load builds an array
of rows to be inserted and uses the SQL INSERT statement to load the data.
During conventional path loads, input records are parsed according to the field
specifications, and each data field is copied to its corresponding bind array.
When the bind array is full (or no more data is left to read), an array insert
is executed.
·
Direct
Path Load
A direct path load builds blocks of data
in memory and saves these blocks directly into the extents allocated for the
table being loaded. A direct path load uses the field specifications to
build whole Oracle blocks of data, and write the blocks directly to Oracle data
files, bypassing much of the data processing that normally takes place. Direct
path load is much faster than conventional load, but entails some restrictions.
A parallel direct path load allows multiple
direct path load sessions to concurrently load the same data segments. Parallel
direct path is more restrictive than direct path.
·
External
Table Load
An external table load creates an external
table for data in a data file and executes INSERT statements to insert the
data from the data file into the target table.
There are two advantages of using external table loads
over conventional path and direct path loads:
·
An external table load attempts to load data files
in parallel. If a data file is big enough, it will attempt to load that file in
parallel.
·
An external table load allows modification of
the data being loaded by using SQL functions and PL/SQL functions as part of
the INSERT statement that is used to create the external table.
Summary
SQL*Loader is a high-speed data loading utility that loads
data from external files into tables in an Oracle database. It provides
database administrators with the fast performance and flexibility required to
get load jobs conducted as quickly and efficiently as possible
sqlldr
username/password@server control=loader.ctl
v
This sample control file (loader.ctl)
will load an external data file containing delimited data:
load data
infile 'c:\data\mydata.csv'
into table emp
fields terminated by "," optionally
enclosed by '"'
( empno, empname, sal, deptno )
$ vi emp.ctl
LOAD DATA
INFILE ‘/u01/oracle/emp.csv’
BADFILE ‘/u01/oracle/emp.bad’
DISCARDFILE ‘/u01/oracle/emp.dsc’
INSERT INTO TABLE emp
FIELDS TERMINATED BY “,” OPTIONALLY ENCLOSED BY ‘”’ TRAILING NULLCOLS
(empno,name,sal,jdate
date ‘mm/dd/yyyy’)
Ø
The LOAD DATA
statement is required at the beginning of the control file
Ø
The INFILE option specifies where the input
file is located
Ø
Specifying BADFILE is optional. If you
specify, then bad records found during
loading will be stored in this file
Ø
Specifying DISCARDFILE is optional. If you
specify, then records which do not meet a WHEN condition will be written
to this file.
Ø
You can use any
of the following loading option
i.
INSERT :
Loads rows only if the target table is empty
ii.
APPEND: Load
rows if the target table is empty or not.
iii.
REPLACE:
First deletes all the rows in the existing table and then, load rows.
iv.
TRUNCATE:
First truncates the table and then load rows.
Ø
This line
indicates how the fields are separated in input file. Since in our case the
fields are separated by “,” so we have specified “,” as the terminating char
for fields. You can replace this by any char which is used to terminate fields.
Some of the popularly use terminating characters are semicolon “;”, colon “:”,
pipe “|” etc. TRAILING NULLCOLS
means if the last column is null then treat this as null value, otherwise, SQL
LOADER will treat the record as bad if the last column is null.
v
You can use SQL*Plus to select and format your
data and then spool it to a file.
This example spools out a CSV (comma separated values)
file that can be imported into MS-Excel:
set echo off newpage 0 space 0 pagesize 0
feed off head off trimspool on
spool oradata.txt
select col1 || ','
|| col2 || ',' || col3
from tab1
where col2 = 'XYZ';
spool off
v
You can also use the "set colsep"
command if you don't want to put the commas in by hand.
Example:
set colsep ','
set echo off newpage 0 space 0 pagesize 0 feed off head
off trimspool on
spool oradata.txt
select col1, col2, col3
from tab1
where col2 =
'XYZ';
spool off
v
PL/SQL's UTL_FILE
package can also be used to unload data.
Example:
declare
fp utl_file.file_type;
begin
fp := utl_file.fopen('c:\oradata','tab1.txt','w');
utl_file.putf(fp, '%s, %sn',
'TextField', 55);
utl_file.fclose(fp);
end;
/
Data can be modified as it loads into the Oracle
Database. One can also populate columns with static or derived values.
However, this only applies for the conventional load path (and not for
direct path loads)
LOAD DATA
INFILE *
INTO TABLE
modified_data
( rec_no
"my_db_sequence.nextval",
region CONSTANT '31',
time_loaded
"to_char(SYSDATE, 'HH24:MI')",
data1 POSITION(1:5) ":data1/100",
data2 POSITION(6:15)
"upper(:data2)",
data3 POSITION(16:22)"to_date(:data3,
'YYMMDD')"
)
BEGINDATA
11111AAAAAAAAAA991201
22222BBBBBBBBBB990112
LOAD DATA
INFILE 'mail_orders.txt'
BADFILE
'bad_orders.txt'
APPEND
INTO TABLE
mailing_list
FIELDS TERMINATED
BY ","
( addr,
city,
state,
zipcode,
mailing_addr
"decode(:mailing_addr, null, :addr, :mailing_addr)",
mailing_city "decode(:mailing_city,
null, :city, :mailing_city)",
mailing_state,
move_date "substr(:move_date, 3, 2) ||
substr(:move_date, 7, 2)"
)
One can load from multiple input files provided they use the
same record format by repeating the INFILE clause.
LOAD DATA
INFILE file1.dat
INFILE file2.dat
INFILE file3.dat
APPEND
INTO TABLE emp
( empno POSITION(1:4) INTEGER EXTERNAL,
ename POSITION(6:15) CHAR,
deptno
POSITION(17:18) CHAR,
mgr POSITION(20:23) INTEGER EXTERNAL
)
One can also specify multiple "INTO TABLE"
clauses in the SQL*Loader control file to load into multiple tables
LOAD DATA
INFILE *
INTO TABLE tab1
WHEN tab = 'tab1'
( tab FILLER CHAR(4),
col1 INTEGER
)
INTO TABLE tab2
WHEN tab = 'tab2'
( tab FILLER POSITION(1:4),
col1 INTEGER
)
BEGINDATA
tab1|1
tab1|2
tab2|2
tab3|3
Look at this example, (01) is the first character, (30:37)
are characters 30 to 37:
LOAD DATA
INFILE 'mydata.dat' BADFILE 'mydata.bad' DISCARDFILE 'mydata.dis'
APPEND
INTO TABLE
my_selective_table
WHEN (01)
<> 'H' and (01) <> 'T' and (30:37) = '20031217'
(
region CONSTANT '31',
service_key
POSITION(01:11) INTEGER
EXTERNAL,
call_b_no POSITION(12:29) CHAR
)
NOTE:
SQL*Loader does not allow the use of OR in the WHEN clause. You can only
use AND as in the example above!
To workaround this problem, code multiple "INTO
TABLE ... WHEN" clauses. Here is an example:
LOAD DATA
INFILE 'mydata.dat' BADFILE 'mydata.bad' DISCARDFILE 'mydata.dis'
APPEND
INTO TABLE
my_selective_table
WHEN (01)
<> 'H' and (01) <> 'T'
(
region CONSTANT '31',
service_key
POSITION(01:11) INTEGER
EXTERNAL,
call_b_no POSITION(12:29) CHAR
)
INTO TABLE
my_selective_table
WHEN (30:37) =
'20031217'
(
region CONSTANT '31',
service_key POSITION(01:11) INTEGER EXTERNAL,
call_b_no POSITION(12:29) CHAR
)
One cannot use POSITION(x:y) with delimited data. Luckily,
from Oracle 8i one can specify FILLER columns. FILLER columns are used to skip
columns/fields in the load file, ignoring fields that one does not want.
LOAD DATA
TRUNCATE INTO TABLE T1
FIELDS TERMINATED BY ','
( field1,
field2 FILLER,
field3
)
BOUNDFILLER
(available with Oracle 9i and above) can be used if the skipped column's value
will be required later again
LOAD DATA
INFILE *
TRUNCATE INTO TABLE sometable
FIELDS TERMINATED BY "," trailing nullcols
(
c1,
field2
BOUNDFILLER,
field3
BOUNDFILLER,
field4
BOUNDFILLER,
field5
BOUNDFILLER,
c2 ":field2 || :field3",
c3 ":field4 + :field5"
)
Conventional Path Load and Direct Path Load
Conventional Path
Conventional path load (the default) uses the SQL INSERT
statement and a bind array buffer to load data into database tables.
When SQL*Loader performs a conventional path load, it
competes equally with all other processes for buffer resources. This can slow
the load significantly. Extra overhead is added as SQL statements are
generated, passed to Oracle, and executed.
The Oracle database looks for partially filled blocks and
attempts to fill them on each insert. Although appropriate during normal use,
this can slow bulk loads dramatically.
Direct Path
In Direct Path Loading, Oracle will not use SQL INSERT
statement for loading rows. Instead it directly writes the rows, into fresh
blocks beyond High Water Mark; in data files i.e. it does not scan for free
blocks before high water mark. Direct Path load is very fast because
o
Partial blocks are not used, so no reads are
needed to find them, and fewer writes are performed.
o
SQL*Loader need not execute any SQL INSERT
statements; therefore, the processing load on the Oracle database is reduced.
o
A direct path load calls on Oracle to lock
tables and indexes at the start of the load and releases them when the load is
finished. A conventional path load calls Oracle once for each array of rows to
process a SQL INSERT statement.
o
A direct path load uses multi block asynchronous
I/O for writes to the database files.
o
During a direct path load, processes perform their
own write I/O, instead of using Oracle's buffer cache. This minimizes
contention with other Oracle users.
o
Restrictions on Using Direct Path Loads
The following conditions must be satisfied for you to
use the direct path load method:
o
Tables are not clustered.
o
Tables to be loaded do not have any active
transactions pending.
o
Loading a parent table together with a child
Table
o
Loading BFILE columns
No comments:
Post a Comment