Syntax:
com.ibi.agents.XDAvroFileReadAgent
Description:
This service reads an Avro container in binary format and returns the objects it contains. The objects are converted to XML or JSON depending on the Conversion Format property. The conversion rules are documented below. The Avro data may come from the input document or a file.
Avro requires the presence of a schema. The Avro File Read service can use the schema always stored in the container, or it can specify a reader schema, in which case Avro will do its best to reconcile the two schemas. The effective schema is stored in the output document, so it can serve as a default for the Avro File Emit service.
The path to the Avro Schema or the Avro Data File can be a regular path in the file system, or a URL starting with hdfs://, which indicates the file is in the Hadoop file system. When the Hadoop file system is used, the parameters Hadoop Configuration and Default File System can be optionally specified. Otherwise, they are ignored.
Parameters:
The following table describes the parameters of the Avro File Read service.
Parameter |
Description |
---|---|
Avro Schema |
Path to the Avro Schema file. If absent, the schema stored with the data will be used. |
Conversion Format |
Format of the incoming Avro data after conversion. The choices are XML or JSON. |
Input Source |
Whether the Avro data is in the Input Document or in a File. |
Avro Data File |
Path to the Avro data file. This is ignored if the Input Source is the Input Document. |
Hadoop Configuration |
Path to the Hadoop configuration file, normally core-site.xml |
Default File System |
In some Hadoop environments, this should be specified as the URI of the namenode, for example hdfs://[your namenode]. |
Edges:
The following table describes the edges that are returned by the Avro File Read service.
Edge |
Description |
---|---|
success |
The Avro data was successfully converted to XML. |
fail_parse |
An iFL expression could not be evaluated. |
fail_notfound |
A file path was specified but the file does not exist. |
fail_operation |
The operation could not be completed successfully. |
XML Conversion Format:
When converting an Avro container to XML, the resulting document has the following format:
<av:avro xmlns:av="http://iwaysoftware.com/avro"> <av:item> ... </av:item> <av:item> ... </av:item> ... </av:avro>
The actual document is not indented. It is pretty-printed here for display purposes only.
The av:avro element represents the Avro container. Each av:item child element represents one Avro object in the container. The contents of the av:item varies depending on its type.
The following table describes how the various Avro types are converted to XML:
Avro Type |
XML Representation |
---|---|
null |
The element has an xsi:nil attribute set to true and no contents. For example: <av:item xsi:nil="true"/> |
boolean |
The string true or false. For example: <av:item>true</av:item> |
int |
A numeric string. For example: <av:item>123</av:item> |
long |
A numeric string. For example: <av:item>123</av:item> |
float |
A numeric string in a fixed point or scientific notation. For example: <av:item>12.34</av:item> |
double |
A numeric string in fixed point or scientific notation. For example: <av:item>1.23E-12</av:item> |
string |
The string. For example: <av:item>abc</av:item> |
enum |
The symbol string. For example: <av:item>SPADES</av:item> |
bytes |
A string of hexadecimal digits, each byte taking exactly two digits. For example: <av:item>040AFCFF</av:item> |
fixed |
A fixed-length string of hexadecimal digits, each byte taking exactly two digits. For example: <av:item>040AFCFF</av:item> |
record |
Each field becomes an unqualified sub-element with the same name as the field and no XML namespace. For example: <av:item> <name>John Smith</name> <address>123 Main Street</address> <city>New York</city> <state>NY</state> </av:item> |
array |
Each item in the array becomes an av:item sub-element. For example: <av:item> <av:item>10</av:item> <av:item>42</av:item> <av:item>99</av:item> </av:item> The actual document is not indented. |
map |
Each entry in the map becomes an av:entry sub-element with the key attribute set to the key, and the contents set to the entry value. For example: <av:item> <av:entry key="k1">val1</av:entry> <av:entry key="k2">val2</av:entry> <av:entry key="k3">val3</av:entry> </av:item> The actual document is not indented. |
union |
The element has an xsi:type attribute set to the selected type and its contents is the union value directly as if the union did not exist. For example: <av:item xsi:type="int">123</av:item> The xsi:type attribute is omitted If the union has only two possible types, one of which is null. For example: <av:item>123</av:item> or else: <av:item xsi:nil="true"/> |
For more complex types, the rules are applied recursively. The name of the element representing a value is always chosen by the rules of its parent scope. The outermost element of an object is always av:item, then the sub-elements might be av:item, av:entry, or the name of a field in a record depending on the type.
Consider the following Avro complex type:
{"type": "record", "name": "Outer", "fields": [ {"name": "rec1", "type": {"type": "record", "name": "Inner", "fields": [ {"name": "f1", "type": "string"}, {"name": "f2", "type": "int"}]}}, {"name": "map1", "type": {"type": "map", "values": "string"}}, {"name": "array1", "type": {"type": "array", "items": "int"}}, {"name": "union1", "type": ["null", "string"]}, {"name": "union2", "type": ["null", "string"]}, {"name": "union3", "type": ["int", "string"]}]}
An instance of this record might look like the following syntax, once it is converted to XML (shown for display purposes only):
<av:item> <rec1> <f1>str1</f1> <f2>11</f2> </rec1> <map1> <av:entry key="k1">v1</av:entry> <av:entry key="k2">v2</av:entry> </map1> <array1> <av:item>10</av:item> <av:item>20</av:item> <av:item>30</av:item> </array1> <union1 xsi:nil="true"/> <union2>u2</union2> <union3 xsi:type="int">33</union3> </av:item>
This would be one item in the av:avro element representing the Avro container.
JSON Conversion Format
When converting an Avro container to JSON, the resulting document is a JSON array. Each item in the array represents one Avro object in the container. The type of each item depends on the Avro schema.
The following table describes how the various Avro types are converted to JSON.
Avro Type |
JSON Representation |
---|---|
null |
null |
boolean |
The value true or false. |
int |
The integer value. For example: 123 |
long |
The long value. For example: 123 |
float |
The float value in a fixed point or scientific notation. For example: 12.34 |
double |
The double value in a fixed point or scientific notation. For example: 1.23E-12 |
string |
The string. For example: "abc" |
enum |
The symbol string. For example: "SPADES" |
bytes |
A string where every byte is converted to an ISO8859-1 character. For example: "éö" |
fixed |
A string where every byte is converted to an ISO8859-1 character. For example: "éö" |
record |
A JSON Object. For example: {"name":"John Smith", "address":"123 Main Street", "city":"New York", "state":"NY"} The actual document is not indented. |
array |
A JSON Array. The item values are converted recursively. For example: [10,42,99] The actual document is not indented. |
map |
A JSON Object. The key becomes the field name. For example: {"k1":"val1", "k2":"val2", "k3":"val3"} The actual document is not indented. |
union |
A JSON Object with a single field. The field name is the full name of the selected type. The field value is the value of the union. As a special case, a null value in a union is converted directly to null as if the union did not exist. For example: {"int":123} |
For more complex types, the rules are applied recursively. Consider this Avro complex type (shown indented for display purposes only):
{"type": "record", "name": "Outer", "fields": [ {"name": "rec1", "type": {"type": "record", "name": "Inner", "fields": [ {"name": "f1", "type": "string"}, {"name": "f2", "type": "int"}]}}, {"name": "map1", "type": {"type": "map", "values": "string"}}, {"name": "array1", "type": {"type": "array", "items": "int"}}, {"name": "union1", "type": ["null", "string"]}, {"name": "union2", "type": ["null", "string"]}, {"name": "union3", "type": ["int", "string"]}]}
An instance of this record might look like the following syntax once it is converted to JSON (shown indented for display purposes only).
{"rec1": {"f1":"str1", "f2":11}, "map1": {"k1":"v1", "k2":"v2"}, "array1":[10,20,30], "union1":null, "union2":{"string":"u2"}, "union3":{"int":33}}
This would be one value within the array representing the whole container.