Deidentification Service (com.ibi.agents.XDDeidentifyAgent)

Syntax:

com.ibi.agents.XDDeidentifyAgent

iIT Service Object:

transform: De-identify an XML Document

Description:

The Deidentification service provides algorithms that can be used to implement the de-identification of protected health information in accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule. Multiple algorithms can be configured since a combination of algorithms will be needed to deidentify the data correctly.

The Deidentification service takes an XML document as input. The first configured algorithm takes this document as input and modifies it in place. The result is fed into the next configured algorithm and so on. The result of the last configured algorithm is the XML document returned by the service.

Parameters:

The following table lists and describes the parameters common to all algorithm instances.

Parameter

Description

XML Namespace Provider

Provider for the mapping between XML namespace prefix and namespace URI in XPath expressions. If left blank, XPath expressions cannot contain namespaces.

Xpath Syntax

Determines which syntax level of XPath should be used. The default option selects the syntax level as set in the console global settings.

The following table lists and describes the parameters for the first algorithm instance.

Parameter

Description

Name1

Determines the de-identification algorithm to perform.

Argument1

The argument value for de-identification algorithms that take an argument .

Targetnodes1

An XPath expression that returns a nodeset. The de-identification algorithm will be applied to each node in the nodeset. If XML Namespace prefixes are used, they must be declared in the XML Namespace Provider.

For more information about the available algorithms and the meaning of the argument, see the following section.

Edges:

The following table lists the available line edges for the Deidentification Service (com.ibi.agents.XDDeidentifyAgent).

Line Edge

Description

OnError

Error

OnSuccess

Success

OnFailure

Failure

OnCustom

  • OnError
  • OnSuccess
  • OnFailure
  • fail_parse
  • fail_operation

The Deidentification service offers explicit parameters for up to 5 algorithm instances: Algorithms 1 to 5. Extra instances can be created if more than five algorithm instances are needed. Instances for algorithm 6 and above can be created with user parameters named:

algorithmNN
argNN
targetNN

Where:

NN

Is the instance number.

For example, the sixth algorithm instance can be created with the user parameters: argument6, arg6, and target6. There is no limit on the value of NN, though the instance numbers must be consecutive to be recognized. The service stops looking for more algorithm instances as soon as it finds one that is not configured. For example, if Algorithm 3 is not configured, Algorithm 4 and 5 will be ignored.

Algorithms:

The following table describes the available alogrithms in the Deidentification service.

Algorithm

Argument

Description

Encrypt Formatted Digits

encryption key

Maps the digits in a formatted string with 4x4 Playfair algorithm. The encryption key should contain only digits, and is padded if necessary to make it 16 digits long. Playfair works with pairs of characters. When the number of digits in the input is odd, the input is padded by one character and one character is ignored in the output. There is no validation of the resulting number. The algorithm will always produce the same output for the same input when using the same key. The mapping is not reversible. Different input might map to the same output, though this is not common. Only the digits are mapped, the other characters are preserved at the same position. For example, (212)223-3333 might map to (655)887-2424 with the format characters preserved.

Encrypt SSN

encryption key

Maps Social Security Numbers by encrypting them with 4x4 Playfair algorithm. The encryption key should contain only digits, and is padded if necessary to make it 16 digits long. Playfair works with pairs of characters. The input is padded by one character. The output is 10 digits long and the last character is eliminated to make it 9 digits long. The algorithm will never produce a Social Security Number starting with 9, starting with 666, or with all zeroes in any of the three groups: 000-xx-xxxx, xxx-00-xxxx, and xxx-xx-0000. If the resulting Social Security Number is invalid, the output is re-encrypted until it becomes valid. The algorithm will always produce the same encrypted SSN for the same input SSN when using the same key. The mapping is not reversible. Different input SSN might map to the same encrypted SSN. Only the digits are mapped, while the other characters are preserved at the same position. For example, 111-22-3333 might map to 655-88-2233 with the hyphens preserved.

Sequential SSN

unused

Maps Social Security Numbers to numbers in consecutive increasing order starting with 001010001. The numbers produced adhere to the rule used by Social Security Administration on June 25, 2011 and later. The algorithm will never produce a Social Security Number with all zeroes in any of the three groups: 000-xx-xxxx, xxx-00-xxxx, and xxx-xx-00000. Only the digits are mapped, while the other characters are preserved at the same position. For example, 111-22-3333 might map to 001-01-0035 with the hyphens preserved. Within a single run, a previously mapped SSN will map to the same number it was assigned earlier. The mapping is not preserved across runs.

Text Constant

new text

Replaces the contents of an element by a string constant. The element keeps its attributes, but all existing mixed content children are removed. The argument is evaluated once to obtain the string constant.

Random Day

date pattern

Parses a date using the pattern in the argument, replaces the day and month with random values, and clips the year to at most 90 years ago. The argument is evaluated once to obtain the date pattern in SimpleDateFormat syntax.

Text Eval

iFL expression

Replaces the contents of an element by a string obtained by evaluating an iFL expression. The element keeps its attributes, but all existing mixed content children are removed. The argument is evaluated for each target node processed. The existing text value of the element is stored in the special register iway.value. The new value can depend on the existing value by calling _sreg (iway.value) within the iFL expression.

XML Constant

XML String

Replaces the contents of an element by a constant XML fragment. The element keeps its attributes, but all existing mixed content children are removed. The argument is evaluated once to obtain an XML string. The XML string is parsed for each target node, using the XML Namespace context of that node.

XML Delete

unused

Deletes the XML target nodes. The node itself, its attributes, and all mixed content children are removed.

XML Eval

iFL expression

Replaces an XML element by the result of an iFL expression parsed as XML. The node itself, its attributes, and all mixed content children are removed. The argument is evaluated for each target node processed. The resulting XML String is parsed using the XML Namespace context of the parent node. The first element of the resulting XML fragment is the new element replacing the target node. The new value can depend on the old value because the old element is passed as the root of the document during evaluation. For example, the _XPATH() function can be called to extract values from the old node.

XML Replace

XML String

Replaces an element by a constant XML element. The node itself, its attributes, and all mixed content children are removed. The argument is evaluated once to obtain an XML string. The XML string is parsed for each target node, using the XML Namespace context of the parent of that node. The first element of the resulting XML fragment is the new element replacing the target node.

Zip Keep First 3

unused

Keeps the first three digits of the five digit zip code followed by 00. The four digit code after the hyphen is ignored when present. For example, 10121-2898 becomes 10100.

Zip Keep last 2

unused

Keeps the last two digits of the five digit zip code preceded by 000. The four digit code after the hyphen is ignored when present. For example, 10121-2898 becomes 00021.

Example:

The following example shows the effect of various algorithms. The following table lists the parameter values.

Parameter

Value

Name 1

Sequential SSN

Target Nodes 1

/workforce/employee/ssn

Name 2

Text Constant

Argument 2

John Doe

Target Nodes 2

/workforce/employee/name

Name 3

Text Eval

Argument 3

_sreg(iway.value) - _imod(_sreg(iway.value),10)'-'_sreg(iway.value) - _imod(_sreg(iway.value),10) + 9

Target Nodes 3

/workforce/employee/age

Name 4

XML Constant

Argument 4

<street>1731 Technology Drive</street><city>San Jose</city><state>CA</state><zip>95110</zip>

Target Nodes 4

/workforce/employee/address

Name 5

XML Delete

Target Nodes 5

/workforce/employee/title

The following is a sample input document.

<workforce>
    <employee>
        <name>Harry Smith</name>
        <address>
            <street>2 Penn Plaza</street>
            <city>New York</city>
            <state>NY</state>
            <zip>10121</zip>
        </address>
        <title>Marketing Director</title>
        <age>48</age>
        <ssn>078-05-1120</ssn>
    </employee>
    <employee>
        <name>Mary Dickens</name>
        <address>
            <street>10375 Richmond</street>
            <city>Houston</city>
            <state>TX</state>
            <zip>77042</zip>
        </address>
        <title>Sales Engineer</title>
        <age>48</age>
        <ssn>165-16-7999</ssn>
    </employee>
</workforce>

The resulting service output is shown below.

<workforce>
    <employee>
        <name>John Doe</name>
        <address>
             <street>1731 Technology Drive</street>
            <city>San Jose</city>
            <state>CA</state>
            <zip>95110</zip>
        </address>
        <age>40-49</age>
        <ssn>001-01-0001</ssn>
    </employee>
    <employee>
        <name>John Doe</name>
        <address>
            <street>1731 Technology Drive</street>
            <city>San Jose</city>
            <state>CA</state>
            <zip>95110</zip>
        </address>
        <age>30-39</age>
        <ssn>001-01-0002</ssn>
    </employee>
</workforce>