How to: |
The wrangler is a bridge between unstructured Hadoop Distributed File System (HDFS) data and structured or defined data sources. Before you create a wrangler configuration, you must ensure that a connection a HDFS server is available.
HDFS is a distributed file-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster.
Note: Wranglers are a non-Pipeline type of data configuration.
You can also click the following icon on the toolbar:
The New dialog opens, as shown in the following image.
The HDFS Server Location pane opens, as shown in the following image.
For more information on user identity and usage, see the Hadoop Distributed File System Permissions Guide.
Your new connection to a HDFS server is listed as a new node in the Project Explorer tab.