S3 Select Object Content Service (com.ibi.agents.XDS3SelectObjectContentAgent)

Syntax:

com.ibi.agents.XDS3SelectObjectContentAgent

Description:

Filters the contents of an Amazon S3 object based on a simple Structured Query Language (SQL) statement.

Using the XDS3SelectObjectContentAgent service, you can read data from an AWS S3 bucket stored in CSV, JSON, or Parquet format and return the CSV or JSON converted data into an iWay Service Manager (iSM) flow for continuing processing.

This agent calls the selectObjectContent(SelectObjectContentRequest selectRequest) method of com.amazonaws.services.s3.AmazonS3 at version 1.11.714.

Javadoc for this agent can be downloaded from the following link:
https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-s3/1.11.714/aws-java-sdk-s3-1.11.714-javadoc.jar
Amazon S3 documentation for the SelectObjectContent method of the REST API is available from the following link:
https://docs.aws.amazon.com/AmazonS3/latest/API/API_SelectObjectContent.html

For information on the input and output datatypes, you can click the links.

Note: This website documents the latest version of the REST API, which might differ from version 1.11.714. The documentation for earlier REST API releases does not appear to be online or downloadable.
SQL-like syntax for the Select Expression is documented in the Amazon Simple Storage Service Developer Guide available from the following link:
https://docs.aws.amazon.com/AmazonS3/latest/dev/s3-glacier-select-sql-reference.html

Parameters:

Bucket
Bucket Name	Name of the bucket containing the object whose ACL is being retrieved.
Key	Key of the object within the specified bucket whose ACL is being retrieved.
Input
Select Expression	Expression that is used to query the object.
Scan Range Start	When present, process only records starting after that byte index, inclusive counting from zero.
Scan Range End	When used with Scan Range Start, process only records with their first byte located within that index range, inclusive counting from zero. When used without Scan Range Start, process only records with their first byte located within that many bytes of the end of the file.
SSE Customer Key	Customer-provided key, in Base64 format, for use with Amazon S3 server-side encryption.
Input Serialization	Describes the format of the data in the object that is being queried. Parquet. Does not have further parameters. JSON. Has a single input serialization parameter to specify if the input is one JSON object (DOCUMENT) or many JSON records concatenated together (LINES). CSV. Has many parameters in the CSV Input Serialization group to specify the exact syntax.
Compression Type	Specifies the compression format of the object. Possible values are BZIP2, GZIP, NONE.
Output
Output Serialization	Describes the format of the data that you want Amazon S3 to return in response. Possible values are JSON and CSV. The default output serialization for JSON is multiple individual JSON records, separated by a new line. The record delimiter is configurable. Since this format cannot be parsed as a single document, the agent also offers the option to produce a single JSON array containing all the returned JSON records. To create the array, the agent hardcodes the record delimiter to a comma (,) and surrounds all the data with square brackets ([ ]).
Output Document	Format of the output document, either parsed JSON or bytes array. Choosing Parsed JSON overrides all Output Serialization parameters. Parsed JSON. Produces a single JSON array containing all returned JSON records, and stores the parsed tree in the output document. All other output serialization parameters are ignored. Byte Array. Produces a flat document according to the Output Serialization parameter. The options are JSON or CSV. Note: Parquet is not available since Amazon S3 does not offer Parquet output for this operation. Extra parameters for JSON or CSV output serialization are available in their own property group.
JSON Input Serialization
JSON Type	Type of JSON when the Input Serialization is JSON. Possible values are DOCUMENT and LINES.
JSON Output Serialization
JSON Record Delimiter	Value used to separate individual records when the Output Serialization is JSON.
JSON Array Output	Whether the output will be a JSON array collecting all records, or separate individual JSON records. When enabled, this property overrides the JSON Record Delimiter parameter.
CSV Input Serialization CSV has many parameters in the CSV Output Serialization group to specify the exact syntax, many reflecting a similar parameter in the CSV Input Serialization group. Parameters for an input serialization, other than the chosen one, are ignored. Similarly, for the output serialization.
Allow Quoted Record Delimiter	Specifies that CSV field values may contain quoted record delimiters and such records should be allowed.
Comments	Single character used to indicate that a row should be ignored when the character is present at the start of that row. You can specify any character to indicate a comment line.
Field Delimiter	Single character used to separate individual fields in a record. You can specify an arbitrary delimiter.
File Header Info	Describes the first line of input. NONE means the first line is not a header. IGNORE means the first line is a header, but you must use column positions like _1 to indicate the column. USE means the first line is a header and you can use the header value to identify a column in an expression.
Quote Character	Single character used for escaping when the field delimiter is part of the value.
Quote Escape Character	Single character used for escaping the quote character inside an already escaped value.
CSV Record Delimiter	Single character used to separate individual records in the input. You can specify an arbitrary delimiter.
CSV Output Serialization
Output Field Delimiter	Sets the character used to separate individual fields in a record.
Output Quote Character	Sets the character used for escaping, where the field delimiter is part of the value.
Output Quote Escape Character	Sets the character used for escaping the quote character inside an already escaped value.
Output Quote Fields	Indicates whether to use quotation marks around output fields. ALWAYS means always use quotation marks for output fields. ASNEEDED means use quotation marks for output fields, when needed.
Output CSV Record Delimiter	Value used to separate individual records when the Output Serialization is CSV.