Data Similarity Overview

Scenario	Resolution
Different database systems used by the financial institution and the acquired institution.	Identify the data sources that need to be scanned to find the required customers that match the eligibility criteria. Add these data sources as resources in Live Data Map to extract metadata from these resources. Alex identifies the databases in the enterprise that include the customer details.
Lack of consistency and context in the column names that makes it difficult to find and analyze source columns with similar data.	Enable profiling with similarity profiling for the selected resources. Live Data map runs a profile on the data sources and verifies the profile results for data similarity. Alex uses the profile results to identify the details about the source data, such as the values, uniqueness, and consistency of data. These attributes help Alex filter out the unwanted data. Alex uses similarity discovery to identify columns that contain similar data across all the data sources. From an existing bank report from both the institutions, Alex finds out that both the organizations store the Social Security Number on all records that have customer information based on an existing bank report. If columns across different tables have SSN stored, Alex identifies that the customer details might be present in the tables that include the SSN details column. When Alex searches for an SSN column in the Catalog, Enterprise Information Catalog lists the searched column along with other columns from all the data sources that are similar to the searched column. After finding columns that contain similar data, Alex and his team can identify data that can be joined and duplicate data that can be removed.
Identify the lineage for each data asset, the other assets that are related to a particular asset, and the impact that joining or deleting a specific data asset might cause for the other related data assets.	Alex and his team can view the lineage, impact summary, and relationship view for identified assets using the Enterprise Information Catalog. Viewing the lineage, impact summary, and related asset details help Alex and team to identify the impact before updating or deleting a specific asset.
Classify customers based on the regions and make searches faster.	Alex defines data domains and data domain groups in Live Data Map Administrator. To classify customers based on the regions, Alex performs the following steps: 1. Alex creates a data domain called customer_details in Live Data Map Administrator. 2. Alex assigns the data domain to one of the columns that contain the SSN in Enterprise Information Catalog. 3. Alex defines data domains called ZIP_code_<area> in Live Data Map Administrator. Alex replaces the part <area> with the branch locations of the financial institutions when defining the data domain. Alex configures each data domain by performing the following steps: a. Specifies the proximity rule for the data domain when creating the data domain. Alex creates data domains for all the ZIP Codes where the financial institutions have branches. A proximity rule specifies that if a specified data domain is not found in a table, Live Data Map can reduce the inference percentage for the new data domain by a specified percentage value. In this case, Alex specifies that if the data domain customer_details is not found in a table, Live Data Map can reduce the inference percentage for the data domain ZIP_code_<area> by 100 percent. This rule specifies that if the column SSN is not found in a table, Live Data Map does not search for the ZIP Code in that table. b. Specifies a rule for each data domain in the Analyst Tool or the Developer Tool for each data domain ZIP_code_<area>. Live Data Map uses the rule to match a column data pattern with the ZIP code for a specific branch. Note: A rule is business logic that defines conditions applied to data when you run a profile. You can add a rule to the profile to cleanse, modify, or validate the data in the profile. 4. Alex then creates four data domain groups based on the regions called Northeast, South, Midwest, and West, and includes the data domains in the respective data domain group. For example, the data domain that corresponds to the ZIP_code_LosAngeles ZIP Code is included in the West data domain group. 5. Alex performs a search in Enterprise Information Catalog for customer_details. Enterprise Information Catalog lists all the columns that include SSN details of the customers and also shows the data domains ( ZIP_code_<area>) and the data domain groups associated with the column. Alex can also search based on the defined data domain groups to find a list of columns with customer details specific to a region.

Data Similarity Overview

Business Example