Profiling Overview
Use profiling to find the content, quality, and structure of data sources of an application, schema, or enterprise. The data source content includes value frequencies and data types. The data source structure includes keys and functional dependencies.
As part of the discovery process, you can create and run profiles. A profile is a repository object that finds and analyzes all data irregularities across data sources in the enterprise and hidden data problems that put data projects at risk. Running a profile on any data source in the enterprise gives you a good understanding of the strengths and weaknesses of its data and metadata.
You can use Informatica Analyst and Informatica Developer to analyze the source data and metadata. Analysts and developers can use these tools to collaborate, identify data quality issues, and analyze data relationships. Based on your job role, you can use the capabilities of either the Analyst tool or Developer tool. The degree of profiling that you can perform differs based on which tool you use.
You can perform the following tasks in both the Developer tool and Analyst tool:
- •Perform column profiling. The process includes discovering the number of unique values, null values, and data patterns in a column.
- •Perform data domain discovery. You can discover critical data characteristics within an enterprise.
- •Curate profile results including data types, data domains, primary keys, and foreign keys.
- •Create scorecards to monitor data quality.
- •Choose an operating system profile to create and run column profiles, enterprise discovery profiles, and scorecards based on the permissions of the operating system user that you define in the operating system profile.
- •Use repository asset locks to prevent other users from overwriting work.
- •Use version control system to save multiple versions of a profile.
- •Create and assign tags to data objects.
- •Look up the meaning of an object name as a business term in the Business Glossary Desktop. For example, you can look up the meaning of a column name or profile name to understand its business requirement and current implementation.
You can perform the following tasks in the Developer tool:
- •Discover the degree of potential joins between two data columns in a data source.
- •Determine the percentage of overlapping data in pairs of columns within a data source or multiple data sources.
- •Compare the results of column profiling.
- •Generate a mapping object from a profile.
- •Discover primary keys in a data source.
- •Discover foreign keys in a set of one or more data sources.
- •Discover functional dependency between columns in a data source.
- •Run data discovery tasks on a large number of data sources across multiple connections. The data discovery tasks include column profile, inference of primary key and foreign key relationships, data domain discovery, and generating a consolidated graphical summary of the data relationships.
You can perform the following tasks in the Analyst tool:
- •Perform enterprise discovery on a large number of data sources across multiple connections. You can view a consolidated discovery results summary of column metadata and data domains.
- •Perform discovery search to find where the data and metadata exists in the enterprise. You can search for specific assets, such as data objects, rules, and profiles. Discovery search finds assets and identifies relationships to other assets in the databases and schemas of the enterprise.
- •View the profile results for a historical profile run.
- •Compare the profile results for two profile runs in a column profile.
- •View scorecard lineage for each scorecard metric and metric group.
- •View the scorecard dashboard.
- •Add comments to a profile or columns in a profile.
- •Assign tags to a profile or columns in a profile.
- •Assign business terms to columns in a profile.