Overview

It is recommended to use Spark pipelines to transform data because of the variety of source inputs, outputs, and the operations that are permitted. A mapping configuration created using the Big Data Mapper is a one stage procedure. A configured pipeline enables multiple processing stages. The Big Data Mapper facility in iBDI is used to transform data contained in Hive tables.

Understanding the Big Data Mapper

The Big Data Mapper is a column-oriented data mapping engine. The Transformer in the Pipeline is a table-oriented mapping engine that uses newer features of Spark to perform transformations.

In the Big Data Mapper, a source table is dropped onto the workspace and all of the columns for that table appear with the source table. Drag a Target object onto the workspace, and then map the individual fields from the source to the target.

You can join two sources by first dragging the Source objects onto the workspace, then using the Join icon from the tool menu or the Palette to drag and join similar fields.

The Big Data Mapper offers the following target table options:

Existing table
Input Document Specification (IDS) file
New table

Note: Mappings are a non-Pipeline type of data configuration. The Big Data Mapper only transforms Hive tables.