Asset Details > Understanding technical assets > Data profiling statistics
  

Data profiling statistics

When you enable the data profiling task for a catalog source in Metadata Command Center, the system runs a profile to evaluate the quality of the metadata extracted from the source system. The profile extracts various types of statistics to discover content and structure, such as value distribution, patterns, and data types. The profiling statistics appear in Data Governance and Catalog when you open the technical assets.
To view profiling statistics and sensitive data in Data Governance and Catalog, the organization administrator must grant the View Profiled Statistics and View Sensitive Data feature privileges to the user in Informatica Intelligent Cloud Services Administrator. For more information about the roles, users, and feature privileges, see the Introduction and Getting Started help.
The scope of profiling statistics that Data Governance and Catalog displays depends on the data profiling configuration parameters that you set while configuring a catalog source in Metadata Command Center. If you do not wish to run the profile on all the metadata that is extracted from the source system, you can limit the scope of profiling to a subset of metadata by configuring filters for data profiling in Metadata Command Center. For example, you can choose to run data profiling on only three schemas out of the six schemas that the system is configured to extract.
The following sample image shows the data profiling statistics that appear on a column asset page in Data Governance and Catalog after you configure and run the data profiling task for a catalog source in Metadata Command Center:
Image highlighting the data profiling statistics on the Overview tab of a column asset page.

Properties

This section displays the key data profiling statistics such as the data type of the data element, data type length and scale, and nullable value that indicates if the column can be null or not. You can view more properties of the column on the Properties tab.

Inferred data types

This section displays the data types inferred for each data element after you run a profile. It also displays the count of rows that were sampled, along with the percentage of rows containing the data type. Fixed Length String and String are some data types inferred for the column in the above example.
Note: If the precision or scale of a number type value in a column exceeds 28 digits, the precision and scale of the inferred pattern might appear truncated.

Patterns

This section displays the inferred patterns for data elements after you run a profile. Column patterns can include special characters, such as ~, [, ], =, -, ?, =, {, *, -, >, <, and $.
Note: If the precision or scale of a number type value in a column exceeds 28 digits, the precision and scale of the inferred data type might appear truncated.
The following table describes the pattern characters and what they represent:
Character
Description
'B' or 'b' or ' '
Represents a blank space.
'C' or 'c'
Represents any character.
'L' or 'l'
Represents any lowercase alphabetic character.
‘T’ or ‘t’
Represents a tab.
‘U’ or ‘u’
Represents any uppercase alphabetic character.
9
Represents any numeric character. Data Governance and Catalog displays up to three characters separately in the "9" format. The tool displays more than three characters as a value within parentheses. For example, the format "9(8)" represents a numeric value with eight digits.
'X' or 'x'
Represents any alphabet character. Data Governance and Catalog displays up to three characters separately in the "X" format. The tool displays more than three characters as a value within parentheses. For example, the format "X(6)" might represent the value "Boston."
Note: The pattern character X is not case-sensitive and might represent uppercase characters or lowercase characters from the source data.
'P' or 'p'
Represents "(", the opening parenthesis.
'Q' or 'q'
Represents ")", the closing parenthesis.

Value frequency

This section displays the count of Null, Distinct, and Non-distinct values for the data element or the rule after the profile is run. It also displays the frequent values along with the frequency and frequency percentage of each value. Data Governance and Catalog displays the top 20 value frequencies.