Column Profiles in Informatica Developer

Use a column profile to analyze the characteristics of columns in a data source, such as value percentages and value patterns. You can add filters to determine the rows that the profile reads at run time. The profile does not process rows that do not meet the filter criteria.

You can discover the following types of information about the columns that you run a profile on:

•The number of times a value appears in a column.

•Frequency of occurrence of each value in a column, expressed as a percentage or number of rows.

•Character patterns of the values in a column.

•Statistics, such as the maximum and minimum lengths of the values in a column, and the first and last values.

•Inferred data types, frequency, conformance criteria for data domain discovery, and data type inference status.

You can define a column profile for a data object in a mapping or mapplet or an object in the Model repository. The object in the repository can be in a single data object profile, multiple data object profile, or enterprise discovery profile.

You can choose sampling options, drill-down options, and run-time environment for a column profile. You can add rules and filters to a column profile.

Filtering Options

You can add advanced filters or SQL filters to determine the rows that a column profile uses when you run the profile. The profile does not process rows that do not meet the filter criteria.

Creating an Advanced Filter

You can create an advanced filter with expressions, such as AND, OR, and NOT to make a subset of the original data source.

1. Create or open a single data object profile.

2. Select the Filter view.

3. Click Add.

The Select Wizard dialog box appears.

4. In the Select Wizard dialog box, click Advanced filter.

The Filter dialog box appears.

5. Enter a name and an optional description for the advanced filter.

6. Select Set as Active to apply the filter to the profile. Click Next.

7. Select Filter Definition to define a filter.

8. You can create an advanced filter with the Functions panel or Columns panel.

- In the Functions panel, select a function category, and click the right arrow (>>) button.

In the dialog box, specify the parameters and click OK. The function along with the columns and values appears in the Expression panel.

- In the Columns panel, select a column, and click the right arrow (>>) button. The column appears in the Expression panel.

Add functions, expressions, and values to create an advanced filter.

9. To verify the advanced filter, click Validate.

10. After you create or edit the filter, select Data Preview to view the filtered data. You can set the Maximum rows to preview option.

11. Click Finish.

The New Profile wizard appears with the filter in the Filters view.

Creating an SQL Filter

You can create an SQL filter with SQL queries. You can create an SQL filter for relational data sources.

1. Create or open a single data object profile.

2. Select the Filter view.

3. Click Add.

The Select Wizard dialog box appears.

4. In the Select Wizard dialog box, click Sql filter.

The Filter dialog box appears.

5. Enter a name and an optional description for the advanced filter.

6. Select Set as Active to apply the filter to the profile. Click Next.

7. Select Filter Definition to define a filter.

8. Use the columns in the Columns panel to create an SQL filter.

9. To verify the filter, click Validate.

10. After you create or edit the filter, select Data Preview to view the filtered data. You can set the Maximum rows to preview option.

11. Click Finish.

The New Profile wizard appears with the filter in the Filters view.

Sampling Options

Sampling options determines the number of rows on which the Developer tool runs a profile. You can configure sampling options when you define a profile or run a profile.

The following table describes the sampling options for a profile:

Property	Description
All Rows	Runs a profile on all the rows in the data object. Supported on Native, Blaze, and Spark run-time environment.
Sample First <number> rows	Runs a profile on the sample rows from the beginning of the rows in the data object. You can choose a maximum of 2,147,483,647 rows. Supported on Native and Blaze run-time environment.
Random Sample <number> rows	Runs a profile on a randomly picked number of the rows in the data object. You can choose a maximum of 2,147,483,647 rows. Supported on Native and Blaze run-time environment.
Random Sample (Auto)	Runs a profile on the sample rows computed on the basis of the number of rows in the data object. Supported on Native and Blaze run-time environment.
Limit n <number> rows	Runs a profile based on the number of rows in the data object. When you choose to run a profile in the Hadoop validation environment, Spark engine collects samples from multiple partitions of the data object and pushes the samples to a single node to compute sample size. The Limit n sampling option supports Oracle, SQL Server, and DB2 databases. You cannot apply the Advanced filter with the Limit n sampling option. Supported on Spark run-time environment.
Random Percentage	Runs a profile on a percentage of rows in the data object. Supported on Spark run-time environment.
Exclude approved data types and data domains from the data type and data domain inference in the subsequent profile runs	Excludes the approved data type or data domain from data type and data domain inference from the next profile run.

After you choose to run the profile on a random sample of rows, the random sample algorithm chooses the rows at random in the data object to run the profile on. When you choose a random sampling option for column profiles, the Developer tool performs drilldown on the staged data. This can impact the drill-down performance. When you choose a random sampling option for data domain discovery profiles, the Developer tool performs drill down on live data.