Encoding

Topics:

How to:

A message consists of a sequence of characters. A character itself is an abstract notion. A character is defined by the assignment of a group of bits to a glyph, or to an instance of a character that can be displayed. Encoding refers to the sequence of bits assigned to represent related characters.

There are eight bits in a byte and a limited number of characters that these bits can represent. As a result, the same sequence of bits often is assigned to multiple characters. The bits do not refer to the character as unique among other possible characters, but rather to a specific character within a limited group of characters, for example, the letters in a local alphabet such as English, French, or Japanese.

iWay Service Manager must recognize specific characters in a message. Therefore, it is important to identify the exact sequence or group of characters to which the bits belong. Only with this information can iWay Service Manager correctly interpret and process the message.

iWay Service Manager supports all encoding schemes normally used.

Unicode

The document character set for XML and HTML 4.0 is Unicode (also known as ISO-10646). HTML browsers and XML processors use Unicode internally, but documents are not required to be transmitted in Unicode. Provided that the client and server agree on the encoding scheme, the browser or processor can use any encoding that can be converted to Unicode. The character encoding scheme of any XML or (X)HTML document must be clearly labeled. With this information, clients can easily map these encoding schemes to Unicode.

Working With XML Documents

Because XML documents can originate from sources using many languages, the encoding scheme of a specific document is, as a standard, included in the document. Encoding schemes for XML documents are expressed by names assigned by the Internet Assigned Naming Authority (IANA). iWay Service Manager recognizes this encoding declaration and respects it for analysis and handling of the message.

The responsibility for declaring the correct encoding scheme belongs to the originator of the document. An XML message without a specific encoding declaration is given a default encoding scheme by examining the first few characters of the message. The usual default assignment is ASCII or EBCDIC.

Specifying the wrong encoding scheme for a message is a common source of problems and usually results in the inability of iWay Service Manager to parse the message, thus generating an error. For example, it is a common mistake to assign the encoding scheme UTF-8 to every message under the assumption that this is the "cover all cases" scheme.

In reality, UTF-8 is a variable-bit sequence that is very specific; some characters (ASCII 127 and lower) map correctly. However, other characters (above 127) consist of bit patterns that may not be valid UTF-8 encoding. Erroneous use of UTF-8 often results in parsing errors.

Working With Non-XML Documents

A non-XML document does not carry its encoding scheme in a manner that iWay Service Manager can recognize. Such a document may be processed into XML by preparser exits. In this case, iWay Service Manager must recognize which encoding scheme to apply to the message. The listener configuration must specify the encoding scheme to use.

From the console, you can set the default encoding that iWay Service Manager uses when it cannot determine the encoding scheme from an incoming message. For instructions, see How to Set the Default Encoding on a Listener.

Although the engine is optimized for handling XML documents, including non-XML that passes through preparsers to create XML, you can pass non-XML through the engine stages.

A non-XML message is referred to as a flat document that, depending on the message, stores the message as a byte array, a string, or an attachment array. A flat document does not pass through the preparser and reviewer exits but is passed through to the business exits.

A common use for a flat document is simple protocol conversion, in which a message is retrieved on one protocol and emitted on another. If no transformation or processing is required, a performance benefit can often be obtained through the elimination of the XML conversion and parsing. Protocol emitters can emit messages from both XML and flat documents.

An incoming message can be established as a flat document by setting the appropriate listener property, so that all documents arriving on that listener are treated as flat. An exit can also store flat information in the document, in which case the document is marked as flat. Another exit can return the document to an XML state by storing an XML tree.

Java Encoding Schemes

iWay Service Manager processes messages using the Java language and uses the appropriate Java encoding scheme to convert the sequence of bits into usable information. For this reason, iWay Service Manager converts the IANA names to their appropriate Java encoding equivalents. There is a one-to-one mapping from IANA names to Java names; however, there is no mapping in the other direction. In addition, Java names can vary by platform and locale. Therefore, a listener configuration must include the Java encoding name.

Procedure: How to Set the Default Encoding on a Listener

To set the default encoding on a listener:

  1. From the main menu, choose Server, then General Settings.

    The General Settings pane opens, as shown in the following image.

  2. In the Encoding field, select the default encoding by choosing the encoding name from the drop-down list or typing it directly.

    The default value is the platform encoding scheme used to read and write characters in the native file system and depends on the platform on which iWay Service Manager runs.

    The specified encoding scheme must be available for use by iWay Service Manager. Encoding schemes are provided to Java in the I18N.jar file. You must obtain the appropriate I18N.jar for your locale and platform and load it into the iWay lib directory. The JAR file can be obtained from:

    www.javasoft.com
  3. Click Update.