Reference Data Overview
A reference data object identifies a set of data values that you can select when you configure transformations for data quality operations. You can create reference data objects in the Developer tool and the Analyst tool. You can also import reference data objects to the Model repository and to the file system. You can use the Data Quality Content installer to import reference data objects and to install reference data to the file system.
You can create and edit the following types of reference data:
- Reference tables
A reference table contains the standard version and alternative versions of a set of data values. You add a reference table to a transformation in the Developer tool to verify that source data values are accurate and correctly formatted.
A database table contains at least two columns. One column contains the standard or preferred version of a string, and other columns contain alternative versions. When you add a reference table to a transformation, the transformation searches the input port data for values that also appear in the table. You can create tables with any data that is useful to the data project you work on.
- Content sets
A content set is a Model repository object that specifies reference data values in the repository or in a file. When you add a content set to a transformation, the transformation searches the input data for values that match the data patterns in the content set.
The Data Quality Content installer can install the following types of reference data:
- Informatica reference tables
Repository objects and data files that Informatica develops. You import Informatica reference tables when you import accelerator objects to the Model repository. The types of reference information include telephone area codes, postcode formats, first names, Social Security number formats, occupations, and acronyms. You can edit Informatica reference tables.
- Informatica content sets
Repository objects and data files that Informatica develops. You import content sets when you import accelerator objects to the Model repository. A content set contains different types of reference data that you can use to perform search operations with data quality transformations.
- Address reference data files
Reference data files that contain data for the deliverable addresses in a country. The Address Validator transformation reads the reference data. You cannot create or edit address reference data files.
Address reference data is current for a defined period and you must refresh your data regularly, for example every quarter. You cannot view or edit address reference data.
- Identity population files
Reference data files that contain information on personal, household, and corporate identities. The Match transformation and the Comparison transformation use population files to find potential identities in input data. You cannot create or edit address identity population files.