Encoding

Topics:

Messages consist of a sequence of characters. A character itself is an abstract notion; a character is the assignment of a group of bits to a glyph, or some displayable instance of a character. Encoding refers to the sequences of bits assigned to represent related characters.

Since there are eight bits in a byte and a seemingly unlimited number of characters that these bits might represent, the same sequence of bits is often assigned to multiple characters. The bits themselves refer not to the character as unique among all characters, but rather to a specific character within a limited group of characters. An example of this is letters in a local alphabet such as Hebrew, English, or French.

iWay Application System Adapter for PeopleSoft must be able to recognize and react to specific characters; therefore, it is important to identify for any message exactly to which sequence or group of characters the bits of the message belong. Only with this information can iWay Service Manager properly understand and treat the message.

Because XML documents can originate from sources using many languages, the encoding of the specific document is, as standard, included in the document. iWay Service Manager recognizes this encoding assertion, and properly respects it for analysis and handling of the XML message. Any XML message without a specific encoding declaration is given a default encoding by examining the first few characters of the XML message itself. The usual default assignments are ASCII or EBCDIC. The responsibility of declaring the correct encoding lies with the originator of the document. Encodings for XML documents are expressed by names assigned by the Internet Assigned Naming Authority (IANA). Because messages are processed using the Java language, the appropriate Java encoding must be used to convert the sequence of bits into usable information. To this end, iWay Service Manager converts the IANA names to their appropriate Java encoding equivalents.

Non-XML documents do not carry their encoding in any manner that iWay Service Manager can recognize. Such documents need to be processed into XML by preparser exits. In this case, iWay Service Manager needs to know what encoding to apply to the message. The listener configuration must specify the encoding to be used. There is a one-to-one mapping from IANA names to Java names, however, there is no mapping in the other direction. In addition, Java names can vary by platform and locale. Therefore, listener encoding configurations are defined directly as Java encoding names by selecting one of the offered values in the configuration entry for the listener, or by entering the appropriate Java encoding name directly.

The default is the platform system encoding used to read and write characters into the native file system.

iWay Service Manager applies the listener encoding as a default whenever it cannot determine the encoding in any other more appropriate manner.

The ISO-8859-1 encoding does not interpolate characters; it assigns characters on a byte for byte basis from the message to the iWay Service Manager representation of an encoded message. Therefore, this is a good encoding to use for messages that are in binary and do not directly represent a sequence of glyphs, for example, a JPG message.

Specifying the wrong encoding for the message is a common source of problems. Usually this is evidenced by the inability of iWay Service Manager to parse the message, resulting in an error. For example, it is a common, erroneous, practice to assign the encoding UTF-8 to every message under the mistaken assumption that this is the "cover all cases" encoding. In reality, UTF-8 is a variable bit sequence that is very specific; some characters (the ASCII lower 127) map correctly. However, other characters above 127 consist of bit patterns that may or may not be valid (but rarely correct) UTF-8 encodings. This results in parsing errors.

The specified encodings must be available for use by iWay Service Manager. Encodings are provided to Java in the I18N.jar file. You must obtain the appropriate I18N.jar for your locale and platform, and load it into the iWay lib directory. This jar file can be obtained from www.javasoft.com.

Flat Documents

Although the engine is optimized for handling XML documents, including non-XML that passes through preparsers to create XML, it is possible to pass non-XML through the engine stages. Non-XML are referred to as flat documents, and flatness can be set by the listener or explicitly by an exit. Flat documents do not pass through the preparser and reviewer exits, but are passed through business exits.

Flat documents store the message as a byte array, a String, or an attachment array, depending upon the message itself.

Incoming messages can be established as flat documents by setting the appropriate listener parameter, so that all documents arriving on that listener are treated as flat. An exit can also store flat information in the document, in which case the document is marked as flat. Another exit can return the document to XML state by storing an XML tree.

A common use for flat documents is protocol conversion adapters, in which a message is retrieved on one protocol and emitted on another – if no transformation or processing is required, a performance benefit can often be obtained through the elimination of the XML conversion and parsing.

Protocol emitters can emit messages from both XML and flat documents.