Column Profiles in Informatica Developer
Use a column profile to analyze the characteristics of columns in a data set, such as value percentages and value patterns. You can add filters to determine the rows that the profile reads at run time. The profile does not process rows that do not meet the filter criteria.
You can discover the following types of information about the columns that you run a profile on:
- •The number of times a value appears in a column.
- •Frequency of occurrence of each value in a column, expressed as a percentage.
- •Character patterns of the values in a column.
- •Statistics, such as the maximum and minimum lengths of the values in a column, and the first and last values.
- •Inferred datatypes, frequency, percentage of conformance, and datatype inference status.
You can define a column profile for a data object in a mapping or mapplet or an object in the Model repository. The object in the repository can be in a single data object profile, multiple data object profile, or enterprise discovery profile.
You can add rules to a column profile. Use rules to define business logic that you can apply to the source data. You can also change the drill-down options for column profiles to determine whether the drill-down task reads from staged data or live data.
Filtering Options
You can add filters to determine the rows that a column profile uses when performing profiling operations. The profile does not process rows that do not meet the filter criteria.
1. Create or open a column profile.
2. Select the Filter view.
3. Click Add.
4. Select a filter type and click Next.
5. Enter a name for the filter. Optionally, enter a text description of the filter.
6. Select Set as Active to apply the filter to the profile. Click Next.
7. Define the filter criteria.
8. Click Finish.
Sampling Properties
Configure the sampling properties to determine the number of rows that the profile reads during a profiling operation.
The following table describes the sampling properties:
Property | Description |
---|
All Rows | Reads all rows from the source. Default is enabled. |
First | Reads from the first row up to the row you specify. |
Random Sample of | Reads a random sample from the number of rows that you specify. |
Random Sample (Auto) | Reads from a random sample of rows. |
Exclude datatype inference for columns with an approved datatype | Excludes columns with an approved datatype from the datatype inference of the profile run. |