Avro File Iterator Service

Syntax:

com.ibi.agents.XDIterAvroFile

Description:

The Avro File Iterator opens an Avro container in the input document or in a file, and returns the objects it contains one by one. The object is converted to XML or JSON depending on the Conversion Format property. The conversion rules are documented with the Avro File Read service. For more information, see Avro File Read Service. Since the iterated document will only contain one object, the container wrapper will be absent. For example, the root element will always be av:item when converting to XML.

Avro requires the presence of a schema. The Avro File Iterator service can use the schema always stored in the container, or it can specify a reader schema, in which case Avro will do its best to reconcile the two schemas. The effective schema is stored in the output document, so it can serve as a default for the Avro File Emit service.

The path to the Avro Schema or the Avro Data File can be a regular path in the file system, or a URL starting with hdfs://, which indicates the file is in the Hadoop file system. When the Hadoop file system is used, the parameters Hadoop Configuration and Default File System can be optionally specified, otherwise they are ignored.

The Output Document parameter determines the final document. It can be a status document, the original document, the result of the last iteration or an accumulation. The format of the first accumulated value determines the format of the whole accumulation. If the first accumulated value is XML, then the whole accumulation will be XML. Similarly, if the first accumulated value is JSON, then the whole accumulation will be JSON.

When accumulating XML, the XML Accumulation Root parameter determines the root element. Select av:avro when accumulating av:item elements into an XML document that represents an Avro container. This is the case with Iterated Accumulation (because the iterator always produces av:item elements) or with Loop Accumulation when the loop does not modify the root element. Selecting av:avro will also set the schema in the final document. That schema can serve as a default in an Avro File Emit service.

When accumulating JSON, the accumulated values are stored in a JSON array.

Parameters:

The following table lists and describes the parameters of the Avro File Iterator service.

Parameter

Description

Avro Schema

Path to the Avro Schema file. If absent, the schema stored with the data will be used.

Conversion Format

Format of the incoming Avro data after conversion. The choices are XML or JSON.

Input Source

Whether the Avro data is in the input document or in a file.

Avro Data File

Path to the Avro data file. Ignored if the Input Source is Input Document.

Hadoop Configuration

Path to the Hadoop configuration file, normally core-site.xml

Default File System

In some Hadoop environments, this should be specified as the URI of the namenode, for example:

hdfs://[your namenode]

Output Document

The final document emitted is a status document, which is the original document or the result of the last iteration or an accumulation. Accumulations are memory intensive.

XML Accumulation Root

Determines the root element of an accumulation. Select av:avro when accumulating av:item elements. Otherwise, select accumulation. This is ignored if the Output Document is not accumulating or if the first accumulated item is parsed JSON.