You can apply pre-built rules to evaluate the accuracy and validity of your data.
For example, you can determine the accuracy of the input addresses, fix errors in the addresses, and enhance the addresses where possible with additional information. You can also measure and report on the quality of each address, retrieve country-specific enrichments for addresses, and verify addresses to the certification standards that a country defines.
The in-built rules evaluate and update input data and return the results.
Select the required prebuilt function from the Apply Functions section in INFACore. You need to specify the data object variable and the required column name for which you want to apply the function. For information about how to create the data object variable, see .
Get country ISO values
You can apply this function to your data to standardize country names to their respective country codes.
To perform this operation, select the Get Country ISO Values function, and specify the data object variable. Then, enter the name of the country column for which you want the ISO values.
The function reads the input country name and returns the full ISO country name and their respective ISO two-letter or three-letter country codes.
In the following example, the input flat file includes a column named country and contains the following data: India, XYZ, and 123!.
The following example is the input code snippet when you apply the function::
import informatica.infacore as ic FF_DV = ic.get_data_source("Flat File").get_connection("DR_FlatFile").get_data_object("input.csv") dqf = ic.DataQualityFunctions() dqf.get_country_iso_values(FF_DV,"country")
You can apply this function to translate diacritic characters to regular ASCII text characters.
To perform this operation, select the Convert Diacritic English Chars function, and specify the data object variable. Then, enter the column name that contains the diacritic characters.
The function changes the diacritic characters to regular ASCII text. For example, when you apply this function to a column named tëst, the function changes the ë to an e and returns tëst as test.
In the following example, the input flat file includes a column named diacritic_chars and contains diacritic English characters:
The following snippet is the input code when you apply the function:
import informatica.infacore as ic FF_DV = ic.get_data_source("Flat File").get_connection("DR_FlatFile").get_data_object("input.csv") dqf = ic.DataQualityFunctions() dqf.convert_diacritic_english_chars(FF_DV,"diacritic_chars")
The following snippet shows the output, where the diacritic characters are replaced with ASCII text characters:
You can apply this function to identify and remove non-printable characters in the source data.
To perform this operation, select the Remove Control Characters function, and specify the data object variable. Then, enter the column name that contains the control characters.
The function removes the control characters. For example, the input flat file includes a column named characters that contains control characters.
The following snippet is the input code when you apply the function:
import informatica.infacore as ic FF_DV = ic.get_data_source("Flat File").get_connection("DR_FlatFile").get_data_object("input.csv") dqf = ic.DataQualityFunctions() dqf.remove_control_characters(FF_DV,"Characters")
The following snippet shows the output, where the control characters are removed:
You can parse the first name and surname and the respective gender to determine the gender score and gender status for the person name.
If you know the gender for a name, then the function uses the gender specific score to determine the gender. Acceptable input for male and female genders is M and F.
If you do not know the gender, the function uses the highest of the male or female scores to determine the status. The rule also calculates the probable gender based on the first name input and provides a confidence score based on the frequency a name occurs as male or female.
Genders are only assigned a score if the probability of the gender being male or female is 70% or more. Unknown genders always have a confidence score of zero.
To perform this operation, select the Parse Name function, and specify the data object variable. Then, enter the first name, surname, and gender column names that you want to parse.
For example, the input flat file includes columns for first names, surnames, and gender.
The following snippet is the input code when you apply the function:
import informatica.infacore as ic FF_DV = ic.get_data_source("Flat File").get_connection("DR_FlatFile").get_data_object("input.csv") dqf = ic.DataQualityFunctions() result=dqf.parse_name(FF_DV,"FirstName","Surname","Gender") df_reader = ic.DataFrameReader(result) p_df = df_reader.to_pandas() p_df.head()
The function parses the data and returns the following gender score and gender status:
You can apply rules to standardize United States company names and provide acronyms for the acronyms if possible.
To perform this operation, select the Standardize US Companyname function, and specify the data object variable. Then, enter the column name that contains the United States company names that you want to standardize.
For example, the input flat file includes a column named company that contains few company names in the Unites States.
The following snippet is the input code when you apply the function:
import informatica.infacore as ic FF_DV = ic.get_data_source("Flat File").get_connection("DR_FlatFile").get_data_object("input.csv") dqf = ic.DataQualityFunctions() dqf.standardize_us_companyname(FF_DV,"company"))
The following snippet shows the output with the standardized company names:
You can apply this function to flag suspicious or fake names in the source data.
To perform this operation, select the Find Invalid Names function, and specify the data object variable. Then, enter the column name that contains names that you want to validate.
When you run the code, the function validates the names.
For example, the input flat file includes a Name column that contains control characters.
The following snippet is the input code when you apply the function:
import informatica.infacore as ic FF_DV = ic.get_data_source("Flat File").get_connection("DR_FlatFile").get_data_object("input.csv") dqf = ic.DataQualityFunctions() dqf.find_invalid_names(FF_DV,"Name")
The following snippet shows the output, where the function flags the names that are valid and those that are suspected:
You can apply this function to parse a United States Social Security Number (SSN) pattern from a large string of text.
The function parses SSNs with dashes or without dashes, formats, and validates the SSN. Format options include without punctuation, with punctuation, and spaces.
To perform this operation, specify the data object variable, and then enter the column name that contains the SSNs that you want to validate.
By default, the rule writes Social Security Numbers without any punctuation.
The rule can output the following formats:
•No Punctuation - nnnnnnnnn
•Space - nnn nnn nnn
•Dash - nnn-nnn-nnn
To change the standardization format, open the dq_SSN_Format transformation in the rule and update the expression on the SSN_Format port.
When you run the code to validate the SSN, the function returns values as valid or not valid.
For example, the input flat file includes a column named SSN that contains the United States SSN.
The following snippet is the input code when you apply the function:
import informatica.infacore as ic FF_DV = ic.get_data_source("Flat File").get_connection("DR_FlatFile").get_data_object("input.csv") dqf = ic.DataQualityFunctions() dqf.validate_ssn(FF_DV,"ssn")
The following snippet shows the output returned with the validation values for the SSN:
Input_SSN: string Out_SSN: string Out_SSN_Status: string Out_SSN_Status_Message: string Out_SSN_Score: double Out_Remaining_Text: string ---- Input_SSN: [["532459641","680-11-2943","41735abc55555"]] Out_SSN: [["532459641","680112943","41735abc5"]] Out_SSN_Status: [["Valid","Valid","Invalid"]] Out_SSN_Status_Message: [["Valid structure, valid group and area values.","Valid structure, valid group and area values.","Invalid SSN length"]] Out_SSN_Score: [[0.9,0.9,0.25]] Out_Remaining_Text: [["","","41735abc55555"]]
Validate the Unites States zip code
You can validate if the input data is a five-digit United States zip code.
To perform this operation, select the Validate USA Zipcode function, and specify the data object variable. Then, enter the column name that contains the United States zip codes that you want to validate.
When you run the code, the function validates if the zip code is valid.
For example, the input flat file includes a column named zipcode that contains the Unites States zip codes.
The following snippet is the input code when you apply the function:
import informatica.infacore as ic FF_DV = ic.get_data_source("Flat File").get_connection("DR_FlatFile").get_data_object("input.csv") dqf = ic.DataQualityFunctions() dqf.validate_usa_zipcode(FF_DV,"zipcode")
The following snippet shows the validation values for the zip codes:
You can apply this function to verify if the entry is a valid state in the United States.
To perform this operation, select the Validate USA State function, and specify the data object variable. Then, enter the column name that contains the state names that you want to validate.
When you run the code, the function validates the state name.
For example, the input flat file includes a column named state that contains the names of the states in the Unites States.
The following snippet is the input code when you apply the function:
import informatica.infacore as ic FF_DV = ic.get_data_source("Flat File").get_connection("DR_FlatFile").get_data_object("input.csv") dqf = ic.DataQualityFunctions() dqf.validate_usa_state(FF_DV,"state")
The following snippet shows the output returned with the validation values for the states:
You can apply this function to validate if the input string is a valid county name in the United States.
To perform this operation, select the Validate US County function, and specify the data object variable. Then, enter the column name that contains the county names that you want to validate.
When you run the code, the function validates if the county name is valid.
For example, the input flat file includes a column named county that contains the names of the Unites States counties.
The following snippet is the input code when you apply the function:
import informatica.infacore as ic FF_DV = ic.get_data_source("Flat File").get_connection("DR_FlatFile").get_data_object("input.csv") dqf = ic.DataQualityFunctions() dqf.validate_us_county(FF_DV,"County")
The following snippet shows the output returned with the validation values for the counties:
You can apply this function to validate and enrich telephone numbers in the United States.
To perform this operation, select the Validate USA Phone Number function, and specify the data object variable. Then, enter the column name that contains the phone numbers that you want to validate.
When you run the code, the function validates if the phone number is a valid United States number and also enriches the phone number.
In the following example, the input flat file includes a column named phnNumber that contains the names of the Unites States phone numbers.
The following snippet is the input code when you apply the function:
import informatica.infacore as ic FF_DV = ic.get_data_source("Flat File").get_connection("DR_FlatFile").get_data_object("input.csv") dqf = ic.DataQualityFunctions() dqf.validate_usa_phone_number(FF_DV,"phnNumber")
The following snippet shows the output returned with the validation values for the phone number: