Monday, August 25, 2008

Expdp / Impdp

Data pump is a new feature in Oracle10g that provides fast parallel data load. With direct path and parallel execution, data pump is several times faster then the traditional exp/imp. Traditional exp/imp runs on client side. But impdp/expdp runs on server side. So we have much control on expdp/expdp compared to traditional exp/imp. When compared to exp/imp, data pump startup time is longer. Because, it has to setup the jobs, queues, and master table. Also at the end of the export operation the master table data is written to the dump file set, and at the beginning of the import job the master table is located and loaded in the schema of the user.

Following are the process involved in the data pump operation:

Client Process : This process is initiated by client utility. This process makes a call to the data pump API. Once the data pump is initiated, this process is not necessary for the progress of the job.

Shadow Process : When client log into the database, foreground process is created. It services the client data pump API requests. This process creates the master table and creates Advanced queuing queues used for communication. Once client process ends, shadow process also go away.

Master Control Process : MCP controls the execution of the data pump job. There is one MCP per job. MCP divides the data pump job into various metadata and data load or unload jobs and hands them over to the worker processes.

Worker Process : MCP creates worker process based on the valule of the PARALLEL parameter. The worker process performs the task requested by MCP.

Advantage of Data pump

1. We can perform export in parallel. It can also write to multiple files on different disks. (Specify parameters PARALLEL=2 and the two directory names with file specification DUMPFILE=ddir1:/file1.dmp, DDIR2:/file2.dmp)

2. Has ability to attach and detach from job, monitor the job progress remotely.

3. Has more option to filter metadata objects. Ex, EXCLUDE, INCLUDE

4. ESTIMATE_ONLY option can be used to estimate disk space requirements before performs the job

5. Data can be exported from remote database by using Database link

6. Explicit DB version can be specified, so only supported object types are exported.

7. During impdp, we can change the target file names, schema, and tablespace. Ex, REMAP_SCHEMA, REMAP_DATAFILES, REMAP_TABLESPACE

8. Has the option to filter data rows during impdp. Traditional exp/imp, we have this filter option only in exp. But here we have filter option on both impdp, expdp.

9. Data can be imported from one DB to another without writing to dump file, using NETWORK_LINK parameter.

10. Data access methods are decided automatically. In traditional exp/imp, we specify the value for the parameter DIRECT. But here, it decides where direct path can not be used , conventional path is used.

11. Job status can be queried directly from data dictionary(For example, dba_datapump_jobs, dba_datapump_sessions etc)

Exp & Expdp common parameters: These below parameters exists in both traditional exp and expdp utility.

FILESIZE
FLASHBACK_SCN
FLASHBACK_TIME
FULL
HELP
PARFILE
QUERY
TABLES
TABLESPACES
TRANSPORT_TABLESPACES(exp value is Y/N, expdp value is name of the tablespace)

Comparing exp & expdp parameters: These below parameters are equivalent parameters between exp & expdp. Exp and corresponding Expdp parameters...

FEEDBACK => STATUS
FILE => DUMPFILE
LOG => LOGFILE
OWNER => SCHEMAS
TTS_FULL_CHECK => TRANSPROT_FULL_CHECK

New parameters in expdp Utility

ATTACH Attach the client session to existing data pump jobs

CONTENT Specify what to export(ALL, DATA_ONLY, METADATA_ONLY)

DIRECTORY Location to write the dump file and log file.

ESTIMATE Show how much disk space each table in the export job consumes.

ESTIMATE_ONLY It estimate the space, but does not perform export

EXCLUDE List of objects to be excluded

INCLUDE List of jobs to be included

JOB_NAME Name of the export job

KEEP_MASTER Specify Y not to drop the master table after export

NETWORK_LINK Specify dblink to export from remote database

NOLOGFILE Specify Y if you do not want to create log file

PARALLEL Specify the maximum number of threads for the export job

VERSION DB objects that are incompatible with the specified version will not be exported.

ENCRYPTION_PASSWORD The table column is encrypted, then it will be written as clear text in the dump file set when the password is not specified. We can define any string as a password for this parameter.

COMPRESSION Specifies whether to compress metadata before writing to the dump file set. The default is METADATA_ONLY. We have two values(METADATA_ONLY,NONE). We can use NONE if we want to disable during the expdp.

SAMPLE - Allows you to specify a percentage of data to be sampled and unloaded from the source database. The sample_percent indicates the probability that a block of rows will be selected as part of the sample.

Imp & Impdp common parameters: These below parameters exist in both traditional imp and impdp utility.

FULL
HELP
PARFILE
QUERY
SKIP_UNUSABLE_INDEXES
TABLES
TABLESPACES

Comparing imp & impdp parameters: These below parameters are equivalent parameters between imp & impdp. imp and corresponding impdp parameters...

DATAFILES => TRANSPORT_DATAFILES
DESTROY =>REUSE_DATAFILES
FEEDBACK =>STATUS
FILE =>DUMPFILE
FROMUSER =>SCHEMAS, REMAP_SCHEMAS
IGNORE =>TABLE_EXISTS_ACTION(SKIP,APPEND,TRUNCATE,REPLACE)
INDEXFILE, SHOW=>SQLFILE
LOG =>LOGFILE
TOUSER =>REMAP_SCHEMA

New parameters in impdp Utility

FLASHBACK_SCN Performs import operation that is consistent with the SCN specified from the source database. Valid only when NETWORK_LINK parameter is used.

FLASHBACK_TIME Similar to FLASHBACK_SCN, but oracle finds the SCN close to the time specified.

NETWORK_LINK Performs import directly from a source database using database link name specified in the parameter. The dump file will be not be created in server when we use this parameter. To get a consistent export from the source database, we can use the FLASHBACK_SCN or FLASHBACK_TIME parameters. These two parameters are only valid when we use NETWORK_LINK parameter.

REMAP_DATAFILE Changes name of the source DB data file to a different name in the target.

REMAP_SCHEMA Loads objects to a different target schema name.

REMAP_TABLESPACE Changes name of the source tablespace to a different name in the target.

TRANSFORM We can specify that the storage clause should not be generated in the DDL for import. This is useful if the storage characteristics of the source and target database are different. The valid values are SEGMENT_ATTRIBUTES, STORAGE. STORAGE removes the storage clause from the CREATE statement DDL, whereas SEGMENT_ATTRIBUTES removes physical attributes, tablespace, logging, and storage attributes.

TRANSFORM = name:boolean_value[:object_type], where boolean_value is Y or N.

For instance, TRANSFORM=storage:N:table

ENCRYPTION_PASSWORD It is required on an import operation if an encryption password was specified on the export operation.

CONTENT, INCLUDE, EXCLUDE are same as expdp utilities.

Prerequisite for expdp/impdp:

Set up the dump location in the database.

system@orcl> create directory dumplocation
2 as 'c:/dumplocation';

Directory created.

system@orcl> grant read,write on directory dumploc to scott;

Grant succeeded.

system@orcl>

Let us experiment expdp & impdp utility as different scenario...... We have two database orcl, ordb. All the below scenarios are tested in Oracle10g R2 version.


Scenario1 Export the whole orcl database.

Export Parfile content:

userid=system/password@orcl
dumpfile=expfulldp.dmp
logfile=expfulldp.log
full=y
directory=dumplocation

Scenario2 Export the scott schema from orcl and import into ordb database. While import, exclude some objects(sequence,view,package,cluster,table). Load the objects which came from RES tablespace into USERS tablespace in target database.

Export Parfile content:

userid=system/password@orcl
dumpfile=schemaexpdb.dmp
logfile=schemaexpdb.log
directory=dumplocation
schemas=scott

Import parfile content:

userid=system/password@ordb
dumpfile=schemaexpdb.dmp
logfile=schemaimpdb.log
directory=dumplocation
table_exists_action=replace
remap_tablespace=res:users
exclude=sequence,view,package,cluster,table:"in('LOAD_EXT')"

Scenario3 Export the emp table from scott schema at orcl instance and import into ordb instance.

Expdb parfile content:

userid=system/password@orcl
logfile=tableexpdb.log
directory=dumplocation
tables=scott.part_emp
dumpfile=tableexpdb.dmp

Impdp parfile content:

userid=system/password@ordb
dumpfile=tableexpdb.dmp
logfile=tabimpdb.log
directory=dumplocation
table_exists_action=REPLACE

Scenario4 Export only specific partition in emp table from scott schema at orcl and import into ordb database.

Expdp parfile content:

userid=system/password@orcl
dumpfile=partexpdb.dmp
logfile=partexpdb.log
directory=dumplocation
tables=scott.part_emp:part10,scott.part_emp:part20

Impdp parfile content: If we want to overwrite the exported data in target database, then we need to delete emp table for deptno in(10,20).

scott@ordb> delete part_emp where deptno=10;

786432 rows deleted.

scott@ordb> delete part_emp where deptno=20;

1310720 rows deleted.

scott@ordb> commit;

Commit complete.

userid=system/password@ordb
dumpfile=partexpdb.dmp
logfile=tabimpdb.log
directory=dumplocation
table_exists_action=append

Scenario5 Export only tables in scott schema at orcl and import into ordb database.

Expdp parfile content:

userid=system/password@orcl
dumpfile=schemaexpdb.dmp
logfile=schemaexpdb.log
directory=dumplocation
include=table
schemas=scott

Impdp parfile content:

userid=system/password@orcl
dumpfile=schemaexpdb.dmp
logfile=schemaimpdb.log
directory=dumplocation
table_exists_action=replace

Scenario6 Export only rows belonging to department 10 and 20 in emp and dept table from orcl database. Import the dump file in ordb database. While importing, load only deptno 10 in target database.

Expdp parfile content:

userid=system/password@orcl
dumpfile=data_filter_expdb.dmp
logfile=data_filter_expdb.log
directory=dumplocation
content=data_only
schemas=scott
include=table:"in('EMP','DEPT')"
query="where deptno in(10,20)"

Impdp parfile content:

userid=system/password@ordb
dumpfile=data_filter_expdb.dmp
logfile=data_filter_impdb.log
directory=dumplocation
schemas=scott
query="where deptno = 10"
table_exists_action=APPEND

Scenario7 Export the scott schema from orcl database and split the dump file into 50M sizes. Import the dump file into ordb datbase.

Expdp parfile content:

userid=system/password@orcl
logfile=schemaexp_split.log
directory=dumplocation
dumpfile=schemaexp_split_%U.dmp
filesize=50M
schemas=scott
include=table

As per the above expdp parfile, initially, schemaexp_split_01.dmp file will be created. Once the file is 50MB, the next file called schemaexp_split_02.dmp will be created. Let us say, the dump file size is 500MB, then it creates 10 dump file as each file size is 50MB.
Impdp parfile content:

userid=system/password@ordb
logfile=schemaimp_split.log
directory=dumplocation
dumpfile=schemaexp_split_%U.dmp
table_exists_action=replace
remap_tablespace=res:users
exclude=grant

Scenario8 Export the scott schema from orcl database and split the dump file into four files. Import the dump file into ordb datbase.

Expdp parfile content:

userid=system/password@orcl
logfile=schemaexp_split.log
directory=dumplocation
dumpfile=schemaexp_split_%U.dmp
parallel=4
schemas=scott
include=table

As per the above parfile content, initially four files will be created - schemaexp_split_01.dmp, schemaexp_split_02.dmp, schemaexp_split_03.dmp, schemaexp_split_04.dmp. Notice that every occurrence of the substation variable is incremented each time. Since there is no FILESIZE parameter, no more files will be created.

Impdp parfile content:

userid=system/password@ordb
logfile=schemaimp_split.log
directory=dumplocation
dumpfile=schemaexp_split_%U.dmp
table_exists_action=replace
remap_tablespace=res:users
exclude=grant

Scenario9 Export the scott schema from orcl database and split the dump file into three files. The dump files will be stored in three different location. This method is especially useful if you do not have enough space in one file system to perform the complete expdp job. After export is successful, import the dump file into ordb database.

Expdp parfile content:

userid=system/password@orcl
logfile=schemaexp_split.log
directory=dumplocation
dumpfile=dump1:schemaexp_%U.dmp,dump2:schemaexp_%U.dmp,dump3:schemaexp_%U.dmp
filesize=50M
schemas=scott
include=table

As per above expdp par file content, it place the dump file into three different location. Let us say, entire expdp dump file size is 1500MB. Then it creates 30 dump files(each dump file size is 50MB) and place 10 files in each file system.

Impdp parfile content:

userid=system/password@ordb
logfile=schemaimp_split.log
directory=dumplocation
dumpfile=dump1:schemaexp_%U.dmp,dump2:schemaexp_%U.dmp,dump3:schemaexp_%U.dmp
table_exists_action=replace

Scenario10 We are in orcl database server. Now export the ordb data and place the dump file in orcl database server. After expdp is successful, import the dump file into orcl database. When we use network_link, the expdp user and source database schema users should have identical privileges. If there no identical privileges, then we get the below error.

C:\impexpdp>expdp parfile=networkexp1.par

Export: Release 10.2.0.1.0 - Production on Sunday, 17 May, 2009 12:06:40

Copyright (c) 2003, 2005, Oracle. All rights reserved.

Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Produc
tion
With the Partitioning, OLAP and Data Mining options
ORA-31631: privileges are required
ORA-39149: cannot link privileged user to non-privileged user

Expdp parfile content:

userid=scott/tiger@orcl
logfile=netwrokexp1.log
directory=dumplocation
dumpfile=networkexp1.dmp
schemas=scott
include=table
network_link=ordb

As per the above parfile, expdp utility exports the ordb database data and place the dump file in orcl server. Since we are running expdp in orcl server. This is basically exporting the data from remote database.

Impdp parfile content:

userid=system/password@orcl
logfile=networkimp1.log
directory=dumplocation
dumpfile=networkexp1.dmp
table_exists_action=replace

Scenario11 Export scott schema in orcl and import into ordb. But do not write dump file in server. The expdp and impdp should be completed with out writing dump file in the server. Here we do not need to export the data. We can import the data without creating the dumpfile.

Here we run the impdp in ordb server and it contacts orcl DB and extract the data and import into ordb database. If we do not have much space in the file system to place the dump file, then we can use this option to load the data.

Impdp parfile content:

userid=scott/tiger@ordb
network_link=orcl
logfile=networkimp2.log
directory=dumplocation
table_exists_action=replace

Scenario12 Expdp scott schema in ordb and impdp the dump file in training schema in ordb database.

Expdp parfile content:

userid=scott/tiger@orcl
logfile=netwrokexp1.log
directory=dumplocation
dumpfile=networkexp1.dmp
schemas=scott
include=table

Impdp parfile content:

userid=system/password@ordb
logfile=networkimp1.log
directory=dumplocation
dumpfile=networkexp1.dmp
table_exists_action=replace
remap_schema=scott:training

Scenario 13 Expdp table on orcl database and imdp in ordb. When we export the data, export only 20 percent of the table data. We use SAMPLE parameter to accomplish this task.

SAMPLE parameter allows you to export subsets of data by specifying the percentage of data to be sampled and exported. The sample_percent indicates the probability that a block of rows will be selected as part of the sample. It does not mean that the database will retrieve exactly that amount of rows from the table. The value you supply for sample_percent can be anywhere from .000001 up to, but not including, 100.

If no table is specified, then the sample_percent value applies to the entire export job. The SAMPLE parameter is not valid for network exports.

Expdp parfile content:

userid=system/password@orcl
dumpfile=schemaexpdb.dmp
logfile=schemaexpdb.log
directory=dumplocation
tables=scott.part_emp
SAMPLE=20

As per the above expdp parfile, it exports only 20 percent of the data in part_emp table.

Impdp parfile content:

userid=system/password@ordb
dumpfile=schemaexpdb.dmp
logfile=schemaimpdb.log
directory=dumplocation
table_exists_action=replace

Managing Data Pump jobs

The datapump clients expdp and impdp provide an interactive command interface. Since each expdp and impdp operation has a job name, you can attach to that job from any computer and monitor the job or make adjustment to the job.

Here are the data pump interactive commands.

ADD_FILE Adds another file or a file set to the DUMPFILE set.

CONTINUE_CLIENT Changes mode from interactive client to logging mode

EXIT_CLIENT Leaves the client session and discontinues logging but leaves the current job running.

KILL_JOB Detaches all currently attached client sessions and terminates the job

PARALLEL Increase or decrease the number of threads

START_JOB Starts(or resume) a job that is not currently running. SKIP_CURRENT option can skip the recent failed DDL statement that caused the job to stop.

STOP_JOB stops the current job, the job can be restarted later

STATUS Displays detailed status of the job, the refresh interval can be specified in seconds. The detailed status is displayed to the output screen but not written to the log file.

Scenario14 Let us start the job and in between, we stop the job in middle and resume the job. After some time, let us kill the job and check the job status for every activity....

We can find what jobs are running currently in the database by using the below query.

SQL> select state,job_name from dba_datapump_jobs;

STATE JOB_NAME
------------------------------ ------------------------------
EXECUTING SYS_IMPORT_FULL_01

SQL>

C:\impexpdp>impdp parfile=schemaimp1.par

Import: Release 10.2.0.1.0 - Production on Sunday, 17 May, 2009 14:06:51

Copyright (c) 2003, 2005, Oracle. All rights reserved.

Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Produc
tion
With the Partitioning, OLAP and Data Mining options
Master table "SYSTEM"."SYS_IMPORT_FULL_01" successfully loaded/unloaded
Starting "SYSTEM"."SYS_IMPORT_FULL_01": parfile=schemaimp1.par
Processing object type SCHEMA_EXPORT/TABLE/TABLE

Import> stop_job
Are you sure you wish to stop this job ([yes]/no): yes

C:\impexpdp>

When we want to stop the job, we need press Control-M to return Import> prompt. Once it is returned to prompt(Import>), we can stop the job as above by using stop_job command.

After the job is stoped, here is the job status.

SQL> select state,job_name from dba_datapump_jobs;

STATE JOB_NAME
------------------------------ ------------------------------
NOT RUNNING SYS_IMPORT_FULL_01

SQL>

Now we are attaching job again..... Attaching the job does not restart the job.

C:\impexpdp>impdp system/password@ordb attach=SYS_IMPORT_FULL_01

Import: Release 10.2.0.1.0 - Production on Sunday, 17 May, 2009 14:17:11

Copyright (c) 2003, 2005, Oracle. All rights reserved.

Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Produc
tion
With the Partitioning, OLAP and Data Mining options

Job: SYS_IMPORT_FULL_01
Owner: SYSTEM
Operation: IMPORT
Creator Privs: FALSE
GUID: 54AD9D6CF9B54FC4823B1AF09C2DC723
Start Time: Sunday, 17 May, 2009 14:17:12
Mode: FULL
Instance: ordb
Max Parallelism: 1
EXPORT Job Parameters:
CLIENT_COMMAND parfile=schemaexp1.par
IMPORT Job Parameters:
Parameter Name Parameter Value:
CLIENT_COMMAND parfile=schemaimp1.par
TABLE_EXISTS_ACTION REPLACE
State: IDLING
Bytes Processed: 1,086,333,016
Percent Done: 44
Current Parallelism: 1
Job Error Count: 0
Dump File: c:/impexpdp\networkexp1.dmp

Worker 1 Status:
State: UNDEFINED
Import>

After attaching the job, here is the job status.

SQL> select state,job_name from dba_datapump_jobs;

STATE JOB_NAME
------------------------------ ------------------------------
IDLING SYS_IMPORT_FULL_01

SQL>

Attaching the job does not resume the job. Now we are resuming job again.....

Import> continue_client
Job SYS_IMPORT_FULL_01 has been reopened at Sunday, 17 May, 2009 14:17
Restarting "SYSTEM"."SYS_IMPORT_FULL_01": parfile=schemaimp1.par

SQL> select state,job_name from dba_datapump_jobs;

STATE JOB_NAME
------------------------------ ------------------------------
EXECUTING SYS_IMPORT_FULL_01

SQL>

Now again we are killing the same job.... Before we kill, we need to press Control-C to return the Import> prompt.

Import> kill_job
Are you sure you wish to stop this job ([yes]/no): yes

C:\impexpdp>

Now the job is disappared in the database.

SQL> select state,job_name from dba_datapump_jobs;

no rows selected

SQL>

No comments: