Data Quality Rules

Topics:

This section provides a reference for the applicable Data Quality (DQ) rules (Cleansing, Matching, Merging, and Remediation).

Cleansing

  • Cleanse Address:

    Verify and standardize address subject.

    Prerequisites:

    • Loqate for address cleansing and verification.

    Tags:

    • ERR_NO_ZIP_ON_USA_ADDRESS – Address does not have a ZIP code.
    • ERR_ADDRESS_INVALID – Address is invalid based on Loqate address verification and standardization.
  • Cleanse Name:

    If name parts are populated, populate full name. If full name is populated, parse full name to populate name parts.

    Prerequisites:

    • None

    Tags:

    • ERR_NAME_BLANK – Full name attribute is blank.
    • ERR_LAST_NAME_BLANK – Last name attribute is blank.
  • Cleanse SSN:

    Verify SSN conforms to US SSN standards based on Social Security Administration standards.

    Prerequisites:

    • None

    Tags:

    • INF_SSN_NULL – SSN is blank.
    • ERR_SSN_ALL_ZEROS_IN_DIGIT_GROUP – One or more of the digit groups contains all zeros (0), where a digit group is defined as each of the 3 unique segments of a SSN.
    • ERR_SSN_UNACCEPTED_NUMBER
    • WRN_SSN_ZEROS_ADDED – Zeros were added to be beginning of the SSN to make it a valid length.
    • ERR_SSN_USED_FOR_ADVERTISMENT – SSN was used during advertising campaigns and is not valid.
    • ERR_SSN_NOT_A_NUMBER – SSN contains nonnumeric values.
    • ERR_SSN_NO_9_DIGIT – SSN is not 9 digits in length, after removing dashes.
  • Cleanse Email:

    Verify email conforms to minimum standards and is not blacklisted.

    Prerequisites:

    • None

    Tags:

    • WRN_EMAIL_BLACKLISTED – Email included in email black list.
    • ERR_INVALID_EMAIL_ADDR – Email address does not conform to a minimum of <value>@<value>.<value>
  • Cleanse Phone:

    Verify phone numbers follow US phone number length and area code is valid.

    Prerequisites:

    • None

    Tags:

    • WRN_UNKNOWN_AREA_CODE – Phone number contains an unknown area code.
    • WRN_SHORT_PHONE – Phone number is 10 digits in length.
  • Cleanse Purchase Order (PO) Addresses:

    Verify PO addresses are valid.

    Prerequisites:

    • Loqate for address cleansing and verification.

    Tags:

    • ERR_SHIP_TO_NO_ZIP_ON_USA_ADDRESS – Ship To address does not have Zip Code
    • ERR_SHIP_TO_ADDRESS_INVALID – Ship To address invalid based on Loqate address verification and standardization
    • ERR_REQUESTED_NO_ZIP_ON_USA_ADDRESS – Requested address does not have Zip Code
    • ERR_REQUESTED_ADDRESS_INVALID – Requested address invalid based on Loqate address verification and standardization
    • ERR_FULFILLING_NO_ZIP_ON_USA_ADDRESS – Fulfilling address does not have Zip Code
    • ERR_FULFILLING_ADDRESS_INVALID – Fulfilling address invalid based on Loqate address verification and standardization

Matching

Matching is performed based on the following attributes:

  • SSN
  • Full Name and Name Parts
  • email
  • Phone Number

Each attribute has a weight assigned, based on the uniqueness of the attribute. Attributes may have reduced weighting where values do not have exact matches or contain transpositions. Attributes unique to the subject may have negative weighting when the values are completely or somewhat different.

It is considered a Strong match when the total combined score of the match is greater to or equal 200 and a Potential match when greater to or equal to 150, but less than 200.

Records that are considered a potential match have a matching ticket created. This allows a data steward to manually review the low-quality match for accuracy.

Merging

Merging is performed differently based on the subject, depending on the content of each subject.

  • Supplier:
    • Most recent supplier record selected.
  • Phone:
    • All unique phone numbers for each phone number type.
  • Address:
    • All unique addresses for each address type.
  • Email:
    • All unique email addresses for each email address type.
  • Purchase Order:
    • All purchase orders. No merge is performed.

Remediation

Remediation creates two types of tickets (Cleansing and Matching).

Cleansing tickets are created whenever the tag begins with "ERR_". For a complete list of potential tags generated, see Cleansing.

Matching tickets are created when the match quality is only considered to be a 'Potential' match. For more information on match quality, see Matching.