Column Profile Concepts Overview
A column profile determines the characteristics of columns in a data source, such as value frequency, percentages, and patterns.
Column profiling discovers the following facts about data:
- •The number of null, distinct, and non-distinct values in each column, expressed as a number and a percentage.
- •The patterns of data in each column and the frequencies with which these values occur.
- •Statistics about the column values, such as the maximum and minimum lengths of values and the first and last values in each column.
- •Documented data types, inferred data types, and possible conflicts between the documented and inferred data types.
- •Pattern and value frequency outliers.
You can configure the following options when you create or edit a profile:
- •Column profile options. You can select the columns on which you want to run a profile, choose a sampling option, and drill-down option.
- •Add, edit, or delete filters and rules.
In the profile results, you can add comments and tags to a profile and to the columns in a profile. You can assign business terms to columns.
The Model repository locks profiles to prevent users from overwriting work with the repository profile locks. The version control system saves multiple versions of a profile and assigns a version number to each version. You can check out a profile and then check the profile in after making changes. You can undo the action of checking out a profile before you check the profile back in.
Create scorecards to periodically review data quality. You create scorecards before and after you apply rules to profiles so that you can view a graphical representation of the valid values for columns.
Use the Scheduler Service to schedule profile runs and scorecard runs to run at a specific time or intervals. The Scheduler Service manages schedules for profiles, scorecards, deployed mappings, and deployed workflows. You can create, manage, and run schedules in Informatica Administrator.