Live Data Map Administrator Guide > Live Data Map Concepts > Column Data Similarity Workflow
  

Column Data Similarity Workflow

The column data similarity workflow includes multiple stages including data preparation, staging, ingestion, and inference.
If you want to discover similar columns of data in your resources, you must enable data discovery when you configure resources that support column data similarity. After you configure column data similarity for the required resources, when you search for a specific asset in the Enterprise Information Catalog, you can see similar columns flagged with the asset.
Identifying column data similarity involves the following steps:
  1. 1. Similarity data preparation that includes similarity profile execution that includes preparing the data for column data similarity and staging the prepared data to a temporary staging location.
  2. 2. Pushing the staged data from the temporary staging location to Hbase.
  3. 3. Inference that involves ingesting the data from HBase to the catalog and the Similarity discovery system resource comparing the data in the catalog for column data similarity.
Note: Step 2 listed above where you must schedule the Similarity Discovery system resource to run is the only step that you need to perform manually in this workflow.

Similarity Data Preparation

Similarity data preparation involves preparing the data from various sources to start the profiling and staging the prepared data.
The similarity data preparation involves the following stage:
Similarity profile execution
At this stage, Live Data Map uses internal algorithms to prepare data for identifying similar column data in the data sources. Live Data map stores the prepared data in a staging location and then pushes the staged data to HBase.

Similarity Data Inference

Similarity data inference signifies the process of comparing data in the catalog for similarity.
The Similarity discovery system scanner compares the data ingested in the catalog for column data similarity using internal machine intelligence and stores the comparison details in the catalog.
Note: You must specify a schedule for the scanner or run the scanner when required.