Topics: |
This section describes the related Data Quality (DQ) components.
The DQ processes for Cleansing, Matching, Merging, and Remediation can be started and stopped using the Omni Console. Please note that Merging is available only for the MDM Edition and not the DQ Edition.
The services shown in the following image can be managed using the Omni Console.
There is a link to access Data Quality Console form the Omni Console for further details. The console only shows if a DQ process has successfully been launched. It does not check if the services defined with the process have been loaded. If a deployment bundle contains an erroneous or incorrectly configured plan, then the process may start, but the service may be unable to load.
The status of the services within a DQ process can be seen in the console of the process. The console is available under the HTTP port defined for the process, for example:
The list of loaded services can be found in the Applications section.
If an expected service is not listed, it generally indicates an error in the plan implementing the service. In this case, the DQ logs should indicate an error.
Logs for each of the processes are in OmniGenData/logs/dq. Each DQ process writes four logs, as described in the following table.
Log file suffix |
Contents |
---|---|
_access |
HTTP requests that are received by the server. |
_perf |
Execution times for service invocations. |
_err |
Messages that occur during execution of a plan. |
_online |
Messages that occur during execution of a service. Generally this duplicates the _err file. |
In addition, it is possible to log the data exchanged between and each DQ process by enabling the DQ Trace option in the Omni Console:
After enabling the option, Server must be restarted. The option should be enabled only for temporary debugging purposes on small loads. When enabled, Server will write a set of CSV files into OmniGenData/logs. Each file is named according to the DQ process, the transaction ID of the work order being executed, and "send" or "receive" to indicate whether the file contains data sent from to DQ or received by from DQ.
Configuration Options
The HTTP listener port and JVM properties for each process can be modified in the console in the appropriate tab under Managed Services.
The TCP port used by to send and receive data to executing DQ plans – the DQ Listener Port – is defined under Server Settings in the console. This is not an HTTP port and should never be opened in a browser or by any program except the plugin components embedded within a DQ plan.
It is not recommended that any of these settings be modified, but if they are modified then the DQ process and the Server process should be restarted.
Topics: |
The following diagram illustrates the process flow within an DQ plan. This example describes a cleansing process. Matching, merging, and remediation process flows are similar.
The general flow is:
The following table describes the data sent and received in each of the DQ processes.
Process |
Sent |
Received |
---|---|---|
cleansing |
Cleansing overrides and source records associated with the work order |
Cleaned values. Instance records are updated. |
matching |
Instance records associated with the work order |
Ids, master ids, and match quality values for all root subject instances affected by the plan execution. This may be a super set of the instance records sent into the plan. |
merging |
Master ids and match quality values for all root subject records affected by the matching results |
A set of master root subject records and all the subcollection records associated with them. Master root subject records are inserted or updated. Any existing subcollection records associated with the root subject masters are deleted and the new subcollection records are inserted. |
remediation |
Instance records associated with the work order |
Cleansing and matching tickets. Inserted into omni_remediation_ticket. |
This section describes warnings and errors with related workarounds to resolve the issue.
Server fails to connect
If Server is unable to connect to a DQ process, OmniGenData/logs/server/server.log will contain a "Not Found for URL" message such as:
com.ibi.omni.server.services.ServiceException: Not Found for URL http://localhost:9502/Person/cleanse
The most likely cause is that the plan failed to load due to an error in the plan definition. This can be verified by loading the DQ console for the process and checking that the referenced service is available. A new deployment bundle will have to be generated with a corrected plan.
Invalid name warnings
When a reader component in a plan requests a column that is undefined, a warning message is generated in OmniGenData/logs/server/server.log as for example:
WARN com.ibi.omni.cleanse.CleansingSender:64 [] [] Requested column ssn not available in entity Person
will allow the plan to continue execution, however requested value will not be transmitted to the DQ process and the DQ process log will also include a warning, as for example:
<message>[306] ssn not sent by omni</message>
Similarly, if a writer component in a plan indicates to that it will write a column whose name is not recognized by , a "No field found" warning message is generated:
WARN com.ibi.omni.cleanse.CleansingReceiver:59 [] [] No field found for Person.firstName
Process Failure
Server errors that occur while processing DQ streams are logged in the server log. If the DQ plan is still receiving data, an error message is also sent to the executing DQ plan, causing it to abort. The DQ plan will log the error message it receives from and also log its own failure message, which is typically just:
com.ataccama.dqc.online.core.RuntimeErrorReporterException: Configuration execution failed.
If an error occurs within the DQ plan itself, it will attempt to send an error message to Server. Server will log this as a com.ibi.omni.dq.ReceivedErrorException and to stop all active senders and receivers.