Avro File Read Service

Syntax:

com.ibi.agents.XDAvroFileReadAgent

Description:

This service reads an Avro container in binary format and returns the objects it contains. The objects are converted to XML or JSON depending on the Conversion Format property. The conversion rules are documented below. The Avro data may come from the input document or a file.

Avro requires the presence of a schema. The Avro File Read service can use the schema always stored in the container, or it can specify a reader schema, in which case Avro will do its best to reconcile the two schemas. The effective schema is stored in the output document, so it can serve as a default for the Avro File Emit service.

The path to the Avro Schema or the Avro Data File can be a regular path in the file system, or a URL starting with hdfs://, which indicates the file is in the Hadoop file system. When the Hadoop file system is used, the parameters Hadoop Configuration and Default File System can be optionally specified. Otherwise, they are ignored.

Parameters:

The following table describes the parameters of the Avro File Read service.

Parameter

Description

Avro Schema

Path to the Avro Schema file. If absent, the schema stored with the data will be used.

Conversion Format

Format of the incoming Avro data after conversion. The choices are XML or JSON.

Input Source

Whether the Avro data is in the Input Document or in a File.

Avro Data File

Path to the Avro data file. This is ignored if the Input Source is the Input Document.

Hadoop Configuration

Path to the Hadoop configuration file, normally core-site.xml

Default File System

In some Hadoop environments, this should be specified as the URI of the namenode, for example hdfs://[your namenode].

Edges:

The following table describes the edges that are returned by the Avro File Read service.

Edge

Description

success

The Avro data was successfully converted to XML.

fail_parse

An iFL expression could not be evaluated.

fail_notfound

A file path was specified but the file does not exist.

fail_operation

The operation could not be completed successfully.

XML Conversion Format:

When converting an Avro container to XML, the resulting document has the following format:

<av:avro xmlns:av="http://iwaysoftware.com/avro">
			<av:item>
						...
			</av:item>
			<av:item>
						...
			</av:item>
			...
</av:avro>

The actual document is not indented. It is pretty-printed here for display purposes only.

The av:avro element represents the Avro container. Each av:item child element represents one Avro object in the container. The contents of the av:item varies depending on its type.

The following table describes how the various Avro types are converted to XML:

Avro Type

XML Representation

null

The element has an xsi:nil attribute set to true and no contents. For example:

<av:item xsi:nil="true"/>

boolean

The string true or false. For example:

<av:item>true</av:item>

int

A numeric string. For example:

<av:item>123</av:item>

long

A numeric string. For example:

<av:item>123</av:item>

float

A numeric string in a fixed point or scientific notation. For example:

<av:item>12.34</av:item>

double

A numeric string in fixed point or scientific notation. For example:

<av:item>1.23E-12</av:item>

string

The string. For example:

<av:item>abc</av:item>

enum

The symbol string. For example:

<av:item>SPADES</av:item>

bytes

A string of hexadecimal digits, each byte taking exactly two digits. For example:

<av:item>040AFCFF</av:item>

fixed

A fixed-length string of hexadecimal digits, each byte taking exactly two digits. For example:

<av:item>040AFCFF</av:item>

record

Each field becomes an unqualified sub-element with the same name as the field and no XML namespace. For example:

<av:item>
    <name>John Smith</name>
    <address>123 Main Street</address>
    <city>New York</city>
    <state>NY</state>
</av:item>

array

Each item in the array becomes an av:item sub-element. For example:

<av:item>
    <av:item>10</av:item>
    <av:item>42</av:item>
    <av:item>99</av:item>
</av:item>

The actual document is not indented.

map

Each entry in the map becomes an av:entry sub-element with the key attribute set to the key, and the contents set to the entry value. For example:

<av:item>
    <av:entry key="k1">val1</av:entry>
    <av:entry key="k2">val2</av:entry>
    <av:entry key="k3">val3</av:entry>
</av:item>

The actual document is not indented.

union

The element has an xsi:type attribute set to the selected type and its contents is the union value directly as if the union did not exist. For example:

<av:item xsi:type="int">123</av:item>

The xsi:type attribute is omitted If the union has only two possible types, one of which is null. For example:

<av:item>123</av:item>

or else:

<av:item xsi:nil="true"/>

For more complex types, the rules are applied recursively. The name of the element representing a value is always chosen by the rules of its parent scope. The outermost element of an object is always av:item, then the sub-elements might be av:item, av:entry, or the name of a field in a record depending on the type.

Consider the following Avro complex type:

{"type": "record", "name": "Outer", "fields": [
  {"name": "rec1", "type": {"type": "record", "name": "Inner", "fields": [
    {"name": "f1", "type": "string"},
    {"name": "f2", "type": "int"}]}},
  {"name": "map1", "type": {"type": "map", "values": "string"}},
  {"name": "array1", "type": {"type": "array", "items": "int"}},
  {"name": "union1", "type": ["null", "string"]},
  {"name": "union2", "type": ["null", "string"]},
  {"name": "union3", "type": ["int", "string"]}]}

An instance of this record might look like the following syntax, once it is converted to XML (shown for display purposes only):

<av:item>
         <rec1>
             <f1>str1</f1>
             <f2>11</f2>
         </rec1>
         <map1>
             <av:entry key="k1">v1</av:entry>
             <av:entry key="k2">v2</av:entry>
         </map1>
         <array1>
             <av:item>10</av:item>
             <av:item>20</av:item>
             <av:item>30</av:item>
         </array1>
         <union1 xsi:nil="true"/>
         <union2>u2</union2>
         <union3 xsi:type="int">33</union3>
     </av:item>

This would be one item in the av:avro element representing the Avro container.

JSON Conversion Format

When converting an Avro container to JSON, the resulting document is a JSON array. Each item in the array represents one Avro object in the container. The type of each item depends on the Avro schema.

The following table describes how the various Avro types are converted to JSON.

Avro Type

JSON Representation

null

null

boolean

The value true or false.

int

The integer value.

For example:

123

long

The long value.

For example:

123

float

The float value in a fixed point or scientific notation.

For example:

12.34

double

The double value in a fixed point or scientific notation.

For example:

1.23E-12

string

The string. For example:

"abc"

enum

The symbol string. For example:

"SPADES"

bytes

A string where every byte is converted to an ISO8859-1 character.

For example:

 "éö"

fixed

A string where every byte is converted to an ISO8859-1 character.

For example:

"éö"

record

A JSON Object.

For example:

{"name":"John Smith",
 "address":"123 Main Street",
 "city":"New York",
 "state":"NY"}

The actual document is not indented.

array

A JSON Array. The item values are converted recursively. For example:

[10,42,99]

The actual document is not indented.

map

A JSON Object. The key becomes the field name. For example:

{"k1":"val1",
 "k2":"val2",
 "k3":"val3"}

The actual document is not indented.

union

A JSON Object with a single field. The field name is the full name of the selected type. The field value is the value of the union. As a special case, a null value in a union is converted directly to null as if the union did not exist. For example:

{"int":123}

For more complex types, the rules are applied recursively. Consider this Avro complex type (shown indented for display purposes only):

{"type": "record", "name": "Outer", "fields": [
  {"name": "rec1", "type": {"type": "record", "name": "Inner", "fields": [
    {"name": "f1", "type": "string"},
    {"name": "f2", "type": "int"}]}},
  {"name": "map1", "type": {"type": "map", "values": "string"}},
  {"name": "array1", "type": {"type": "array", "items": "int"}},
  {"name": "union1", "type": ["null", "string"]},
  {"name": "union2", "type": ["null", "string"]},
  {"name": "union3", "type": ["int", "string"]}]}

An instance of this record might look like the following syntax once it is converted to JSON (shown indented for display purposes only).

{"rec1":
    {"f1":"str1",
     "f2":11},
 "map1":
    {"k1":"v1",
     "k2":"v2"},
 "array1":[10,20,30],
 "union1":null,
 "union2":{"string":"u2"},
 "union3":{"int":33}}

This would be one value within the array representing the whole container.