How to: |
Apache Sqoop is designed for bulk data input from structured data sources, such as relational databases, into a HDFS. Sqoop reads records row by row in parallel and writes them in the selected file format into a HDFS. Sqoop usually splits by the primary key, or a defined primary key if no primary key is present.
Note: Sqoops are a non-Pipeline type of data configuration.
The New dialog opens, as shown in the following image.
The New Sqoop dialog opens.
The Sqoop opens as a new tab (mysqoop.sqoop) in the iBDI workspace, as shown in the following image.
To remove a table, select the table and then click Delete (the red X icon).
Note: If no database connections have been defined, these fields will appear empty. Go to the Data Source Explorer pane and define the connections to the source and target. Close and then open the Sqoop tab to continue.
More than one table can be selected, as shown in the following image.
The selected tables are now populated in the Source Tables area, as shown in the following image.
The following table lists and describes the available setting you can select for each table in the Options column.
Option |
Description |
---|---|
Replace |
Replaces the data if it already exists in the HDFS and Hive. Subsequent Sqoops of the same table will replace the data. |
CDC |
A key feature in iBDI, which executes Change Data Capture (CDC) logic. This means that subsequent executions of this Sqoop configuration will append only changes to the data. Two extra columns are added to the target table that enables CDC:
|
Native |
Allows advanced users to specify their own Sqoop options. |
Once you have created a Sqoop configuration, a Sqoop file with a .sqoop extension is added to your Sqoops directory in your iWay Big Data Integrator (iBDI) project.
For more information, see Defining Run Configurations.