Live Data Map Administrator Guide > Managing Resources > Enable Data Discovery
  

Enable Data Discovery

You can enable data discovery for a resource to run profiling on the source data and to prepare data to identify similar columns based on source data.
You can enable data discovery for the following types of resources:
Select Enable Data Discovery under Data Discovery in the Metadata Load Settings tab. Configure the following attributes for the resource:
Domain Connection Settings
Configure the properties for the Data Integration Service.
Basic Profile Settings
Configure the properties to enable profiling for the resource.
Similarity Profile Settings
Configure the properties to prepare data to identify similar columns based on source data.
The following table describes the properties that you can configure in the Domain Connection Settings section of the Metadata Load Settings tab:
Property
Description
Specify the configuration settings for Data Integration Service.
  • - Custom. Use custom configuration when you want to configure the Data Integration Service options manually.
  • - Global. Use global configuration when you want to use the existing Data Integration Service options created by the administrator.
Domain Name
Name of the Data Integration Service Domain.
Data Integration Service
Name of the Data Integration Service.
Username
Username to log in to the Data Integration Service.
Password
Password to log in to the Data Integration Service.
Security Domain
Name of the security domain.
Host
Host name for the Data Integration Service.
Port
Port number for the Data Integration Service.
The following table describes the properties that you can configure in the Basic Profile Settings section of the Metadata Load Settings tab:
Property
Description
Profiling Run Options
Choose whether Live Data Map performs column profiling, data domain discovery, or both.
Priority
Specify the profile execution priority value.
Specify one of the following profile execution priority values:
  • - High
  • - Low
Sampling Option
Sampling options determine the number of rows that Live Data Map chooses to run a profile on. You can configure sampling options when you define a profile or when you run a profile.
Number of First N Sampling Rows
The number of rows that you want to run the profile against. Live Data Map chooses the rows from the first rows in the source.
Exclude Views
Select this option to specify that views must be excluded from profiling.
Incremental Profiling
Specifies if the profiling must be run only for the changes made to the source object. If you do not select this option, profiling is run for the whole source every time. Make sure that you enable and update database statistics for the following resources:
  • - Oracle.
  • - SQL Server.
  • - IBM DB2 for z/OS
  • - IBM DB2
Source Connection Name
Event Date Records name for the source connection.
Run On
Specify where the resource must run by selecting one of the following options:
  • - Hadoop: Select this option to specify that the resource must be run on Hadoop using Informatica Blaze.
  • If you select this option, you must specify the Hadoop Connection Name. Click Select... and select the Hadoop connection name from the Select Hadoop Connection Name dialog box.
  • - Native: Select this option to specify that the resource must be run on the Hive engine.
The following table describes the properties that you can configure in the Similarity Profile Settings section of the Metadata Load Settings tab:
Property
Description
Profiling Run Options
Select Enable Similarity Profile to prepare data for identifying similar columns in the data sources.
Sampling Options
Sampling options determine the number of rows that Live Data Map chooses to run a profile on. Select one of the following options from the drop-down list:
  • - Reuse Basic Profile Settings. Select this option to specify that you want to use the sampling option specified in the Basic Profile Settings section.
  • - All Rows. Select this option to specify that all rows must be selected for preparing data for identifying similar column data.
  • - Auto Random Rows. Select this option to specify that Live Data Map must select random rows automatically for identifying similar column data.
  • - Random N Rows. Select this option to specify that Live Data Map must select N number of random rows automatically for identifying similar column data. You must specify the required number of rows for N in the Random Sampling Rows text box.
  • - First N Rows. Select this option to specify that Live Data Map must select first N number of rows for identifying similar column data. You must specify the required number of rows for N in the Number of First N Sampling Rows text box.
Domain Connection Settings
  • - Use Profile Configuration Settings. Select this option to specify that Live Data map must use the Data Integration Service specified in the Domain Connection Settings section to identify similar columns in the data sources.
  • - Specify Domain Connection Settings. Specify separate domain connection settings for a Data Integration Service that you want to use to prepare data for identifying similar columns in the data sources. See the table that lists the domain connection settings for information about domain connection settings properties.