informatica.infacore.functions.dq_functions.DataQualityFunctions.parse_name

DataQualityFunctions.parse_name(src_data_object: DataObject, firstname_col_name: str, surname_col_name: str, gender_col_name: str)

Determines the gender status.If you know the gender for a specified name, the rule determines the status based on the gender-specific score. Acceptable inputs for male and female genders are M and F. If the gender is unknown, the rule determines the status based on the highest of the male or female scores.

The rule also calculates the probable gender based on the first name input and provides a confidence score based on the frequency that a name occurs as male or female. Genders are only assigned a score if the probability of the gender being either male or female is 70% or more. Unknown genders always have a confidence score of zero.

Parameters:
  • src_data_object (DataObject) – The object name for the source DataObject class.

  • firstname_col_name (str) – The name of the firstname column.

  • surname_col_name (str) – The column name that contains the surnames.

  • gender_col_name (str) – The name of the gender column.

Returns:

The gender probability data in columnar representation.

Return type:

table

Example

>>> import informatica.infacore as ic
>>> dq_obj = ic.DataQualityFunctions()
>>> dq_obj.parse_name(src_data_obj, firstname_col_name, surname_col_name, gender_col_name)