S3 Select Object Content Service (com.ibi.agents.XDS3SelectObjectContentAgent)

Syntax:

com.ibi.agents.XDS3SelectObjectContentAgent

Description:

Filters the contents of an Amazon S3 object based on a simple Structured Query Language (SQL) statement.

Using the XDS3SelectObjectContentAgent service, you can read data from an AWS S3 bucket stored in CSV, JSON, or Parquet format and return the CSV or JSON converted data into an iWay Service Manager (iSM) flow for continuing processing.

This agent calls the selectObjectContent(SelectObjectContentRequest selectRequest) method of com.amazonaws.services.s3.AmazonS3 at version 1.11.714.

Parameters:

Bucket

Bucket Name

Name of the bucket containing the object whose ACL is being retrieved.

Key

Key of the object within the specified bucket whose ACL is being retrieved.

Input

Select Expression

Expression that is used to query the object.

Scan Range Start

When present, process only records starting after that byte index, inclusive counting from zero.

Scan Range End

When used with Scan Range Start, process only records with their first byte located within that index range, inclusive counting from zero. When used without Scan Range Start, process only records with their first byte located within that many bytes of the end of the file.

SSE Customer Key

Customer-provided key, in Base64 format, for use with Amazon S3 server-side encryption.

Input Serialization

Describes the format of the data in the object that is being queried.

  • Parquet. Does not have further parameters.
  • JSON. Has a single input serialization parameter to specify if the input is one JSON object (DOCUMENT) or many JSON records concatenated together (LINES).
  • CSV. Has many parameters in the CSV Input Serialization group to specify the exact syntax.

Compression Type

Specifies the compression format of the object. Possible values are BZIP2, GZIP, NONE.

Output

Output Serialization

Describes the format of the data that you want Amazon S3 to return in response. Possible values are JSON and CSV. The default output serialization for JSON is multiple individual JSON records, separated by a new line. The record delimiter is configurable. Since this format cannot be parsed as a single document, the agent also offers the option to produce a single JSON array containing all the returned JSON records. To create the array, the agent hardcodes the record delimiter to a comma (,) and surrounds all the data with square brackets ([ ]).

Output Document

Format of the output document, either parsed JSON or bytes array. Choosing Parsed JSON overrides all Output Serialization parameters.

  • Parsed JSON. Produces a single JSON array containing all returned JSON records, and stores the parsed tree in the output document. All other output serialization parameters are ignored.
  • Byte Array. Produces a flat document according to the Output Serialization parameter. The options are JSON or CSV.

    Note: Parquet is not available since Amazon S3 does not offer Parquet output for this operation. Extra parameters for JSON or CSV output serialization are available in their own property group.

JSON Input Serialization

JSON Type

Type of JSON when the Input Serialization is JSON. Possible values are DOCUMENT and LINES.

JSON Output Serialization

JSON Record Delimiter

Value used to separate individual records when the Output Serialization is JSON.

JSON Array Output

Whether the output will be a JSON array collecting all records, or separate individual JSON records. When enabled, this property overrides the JSON Record Delimiter parameter.

CSV Input Serialization

CSV has many parameters in the CSV Output Serialization group to specify the exact syntax, many reflecting a similar parameter in the CSV Input Serialization group. Parameters for an input serialization, other than the chosen one, are ignored. Similarly, for the output serialization.

Allow Quoted Record Delimiter

Specifies that CSV field values may contain quoted record delimiters and such records should be allowed.

Comments

Single character used to indicate that a row should be ignored when the character is present at the start of that row. You can specify any character to indicate a comment line.

Field Delimiter

Single character used to separate individual fields in a record. You can specify an arbitrary delimiter.

File Header Info

Describes the first line of input. NONE means the first line is not a header. IGNORE means the first line is a header, but you must use column positions like _1 to indicate the column. USE means the first line is a header and you can use the header value to identify a column in an expression.

Quote Character

Single character used for escaping when the field delimiter is part of the value.

Quote Escape Character

Single character used for escaping the quote character inside an already escaped value.

CSV Record Delimiter

Single character used to separate individual records in the input. You can specify an arbitrary delimiter.

CSV Output Serialization

Output Field Delimiter

Sets the character used to separate individual fields in a record.

Output Quote Character

Sets the character used for escaping, where the field delimiter is part of the value.

Output Quote Escape Character

Sets the character used for escaping the quote character inside an already escaped value.

Output Quote Fields

Indicates whether to use quotation marks around output fields. ALWAYS means always use quotation marks for output fields. ASNEEDED means use quotation marks for output fields, when needed.

Output CSV Record Delimiter

Value used to separate individual records when the Output Serialization is CSV.