Enable Data Discovery
You can enable data discovery for a resource to run profiling on the source data and to prepare data to identify similar columns based on source data.
You can enable data discovery for the following types of resources:
- •Amazon Redshift
- •Amazon S3
- •Salesforce
- •HDFS
- •Hive
- •IBM DB2
- •IBM DB2 for z/OS
- •IBM Netezza
- •JDBC
- •Microsoft SQL Server
- •Oracle
- •Sybase
- •Teradata
- •SAP R/3
Select Enable Data Discovery under Data Discovery in the Metadata Load Settings tab. Configure the following attributes for the resource:
- Domain Connection Settings
- Configure the properties for the Data Integration Service.
- Basic Profile Settings
- Configure the properties to enable profiling for the resource.
- Similarity Profile Settings
- Configure the properties to prepare data to identify similar columns based on source data.
The following table describes the properties that you can configure in the Domain Connection Settings section of the Metadata Load Settings tab:
Property | Description |
---|
Specify the configuration settings for Data Integration Service. | - - Custom. Use custom configuration when you want to configure the Data Integration Service options manually.
- - Global. Use global configuration when you want to use the existing Data Integration Service options created by the administrator.
|
Domain Name | Name of the Data Integration Service Domain. |
Data Integration Service | Name of the Data Integration Service. |
Username | Username to log in to the Data Integration Service. |
Password | Password to log in to the Data Integration Service. |
Security Domain | Name of the security domain. |
Host | Host name for the Data Integration Service. |
Port | Port number for the Data Integration Service. |
The following table describes the properties that you can configure in the Basic Profile Settings section of the Metadata Load Settings tab:
Property | Description |
---|
Profiling Run Options | Choose whether Live Data Map performs column profiling, data domain discovery, or both. |
Priority | Specify the profile execution priority value. Specify one of the following profile execution priority values: |
Sampling Option | Sampling options determine the number of rows that Live Data Map chooses to run a profile on. You can configure sampling options when you define a profile or when you run a profile. |
Number of First N Sampling Rows | The number of rows that you want to run the profile against. Live Data Map chooses the rows from the first rows in the source. |
Exclude Views | Select this option to specify that views must be excluded from profiling. |
Incremental Profiling | Specifies if the profiling must be run only for the changes made to the source object. If you do not select this option, profiling is run for the whole source every time. Make sure that you enable and update database statistics for the following resources: - - Oracle.
- - SQL Server.
- - IBM DB2 for z/OS
- - IBM DB2
|
Source Connection Name | Event Date Records name for the source connection. |
Run On | Specify where the resource must run by selecting one of the following options: - - Hadoop: Select this option to specify that the resource must be run on Hadoop using Informatica Blaze.
If you select this option, you must specify the Hadoop Connection Name. Click Select... and select the Hadoop connection name from the Select Hadoop Connection Name dialog box. - - Native: Select this option to specify that the resource must be run on the Hive engine.
|
The following table describes the properties that you can configure in the Similarity Profile Settings section of the Metadata Load Settings tab:
Property | Description |
---|
Profiling Run Options | Select Enable Similarity Profile to prepare data for identifying similar columns in the data sources. |
Sampling Options | Sampling options determine the number of rows that Live Data Map chooses to run a profile on. Select one of the following options from the drop-down list: - - Reuse Basic Profile Settings. Select this option to specify that you want to use the sampling option specified in the Basic Profile Settings section.
- - All Rows. Select this option to specify that all rows must be selected for preparing data for identifying similar column data.
- - Auto Random Rows. Select this option to specify that Live Data Map must select random rows automatically for identifying similar column data.
- - Random N Rows. Select this option to specify that Live Data Map must select N number of random rows automatically for identifying similar column data. You must specify the required number of rows for N in the Random Sampling Rows text box.
- - First N Rows. Select this option to specify that Live Data Map must select first N number of rows for identifying similar column data. You must specify the required number of rows for N in the Number of First N Sampling Rows text box.
|
Domain Connection Settings | - - Use Profile Configuration Settings. Select this option to specify that Live Data map must use the Data Integration Service specified in the Domain Connection Settings section to identify similar columns in the data sources.
- - Specify Domain Connection Settings. Specify separate domain connection settings for a Data Integration Service that you want to use to prepare data for identifying similar columns in the data sources. See the table that lists the domain connection settings for information about domain connection settings properties.
|