Data Discovery Guide > Part III: Data Discovery with Informatica Developer > Enterprise Discovery in Informatica Developer > Overlap Discovery
  

Overlap Discovery

Overlap discovery provides information about overlapping data in pairs of columns within a data source or multiple data sources. You can find overlapping data from an enterprise discovery profile. You can validate the profile results and view the results in a Venn diagram.
Overlap discovery identifies overlapping data based on either the default settings or the settings you specify. You can overrride the default settings and specify inference options, including the maximum number of top pairs the overlap discovery returns based on the percentage of overlap. You can also specify a confidence level that determines the eligibility for overlap discovery.

Overlap Discovery Results

The Overlap Discovery tab displays information on the participating columns and the overlapping percentage value. The overlap discovery results include Venn diagrams that represent the overlapping data in pairs of columns and the date and time when you last performed overlap discovery.
You can click a column and select Verify to view the results as a Venn diagram.
The following table describes the overlap discovery properties:
Property
Description
Left Column
The primary column against which the remaining columns are compared for overlap analysis.
Right Column
The column that is compared to the primary column.
% Overlap
The percentage of overlap between two columns.
Verified
Indicates that you validated the overlap results row.
Last Run Time
The date and time that the overlap discovery last ran.
Informatica Developer displays each overlapping pair two times in the overlap discovery results. Consider data sources Items and Orders. Items has columns "m" and "n." Orders has columns "p" and "q."
The following table shows the overlap discovery results for Items and Orders:
Left Column
Right Column
Items
-
m
Orders.p
m
Orders.q
n
Orders.p
n
Orders.q
Orders
-
p
Items.m
p
Items.n
q
Items.m
q
Items.m

Discovering Overlapping Data

You can determine overlapping data between pairs of columns in an enterprise discovery profile. The overlap analysis is based on unique values in the columns and does not consider null values.
    1. Create or open an enterprise discovery profile that contains the data objects.
    2. Select the data objects on which you want to find overlap data.
    You can select a single data object to find overlap data within pairs of columns or multiple data objects.
    3. Right-click the objects and select Overlap Discovery.
    The New Overlap Discovery dialog box appears.
    4. Enter a name.
    5. Optionally, enter a text description for the overlap analysis.
    6. Verify that the names of the data objects appear under Data Objects in the wizard.
    7. Optionally, select Run Profile on finish to run the profile when you complete configuring the settings.
    8. Click Next.
    9. Select the columns for overlap discovery.
    10. Click Next.
    The default inference options appear in the dialog box.
    11. Optionally, specify the inference options for overlap discovery to override the default settings.
    12. Click Finish.