Example: Use partitions to find the highest salary

You are an HR staff member at your organization. You are working on a project to model how employee salaries are associated with aspects of life that employees find important. The project is part of the wellness program at your organization. You want to use the information to better personalize the wellness program.

You can use the Python transformation to determine which employee earns the highest salary in their department.

The following table shows the data that your organization might collect:

DepartmentName	DepartmentID	EmployeeName	SalaryIndex	EmployeeSince
HR	1	Jane Smith	500	2/16/2010
R&D	2	Ellioth Consar	150	3/29/2018
Finance	3	Concor Valashe	230	11/22/2007
Marketing	4	Manchini Voliore	800	5/17/2009
HR	1	Blaze Concave	501	8/25/2016
R&D	2	Janet Encarr	890	1/26/2019
HR	1	Chelsea Blanch	389	9/3/2018
R&D	1	Samuel Coin	10	1/26/2005

To use the Python transformation to determine which employee earns the highest salary in their department, perform the following tasks:

Step 1. Add a Python transformation to the mapping.

Create a Python transformation. On the Advanced tab, set the behavior to Active.

Step 2. Pass data to the Python transformation.

Pass the following fields from upstream transformations in the mapping to the Python transformation:

- DepartmentName
- DepartmentID
- EmployeeName
- SalaryIndex
- EmployeeSince

Step 3. Partition the data by department.

Partition the data by department to track the highest salary within each department. To partition the data by department, add the incoming field DepartmentID as a partition key on the Partition Keys tab.

Step 4. Create output fields.

Create the following output fields on the Output Fields tab to pass data to downstream transformations:

- DepartmentName_out
- DepartmentID_out
- EmployeeName_out
- SalaryIndex_out
- EmployeeSince_out

Step 5. Initialize a map.

Declare a map variable outputmap to associate each department ID with the employee in the department who has the highest salary.

Add the following code in the Pre-Partition Python Code section:

print("Using partitions to find the employee with the highest salary")
outputmap = {}

Step 6. Define code to process the data.

For each input row that passes through the Python transformation, define code that checks if the salary of the employee is higher than the maximum salary of the previous rows that have been processed. If the salary of the employee is higher, update the employee who has the maximum salary in the department.

Add the following code in the Main Python Code section:

DepartmentID_out = DepartmentID
print("Processing rows for department ID " + str(DepartmentID_out))

outputmap.setdefault(DepartmentID, None)
updateMax = False

if outputmap.get(DepartmentID, None) is None:
updateMax = True
else:
max_salary = outputmap[DepartmentID]['SalaryIndex']
if max_salary is None:
updateMax = True
if SalaryIndex > max_salary:
updateMax = True

if updateMax == True:
employee_data = {'SalaryIndex':SalaryIndex,'EmployeeName':EmployeeName,
'EmployeeSince':EmployeeSince,'DepartmentName':DepartmentName}

outputmap[DepartmentID] = employee_data

Step 7. Write the data to the output files.

In the Post-Partition Python Code section of the Python tab, use the data in the map variable outputmap to generate a row for the employee that has the highest salary in each department.

Add the following code in the Post-Partition Python Code section:

for x in outputmap:
DepartmentID_out = x
smap = outputmap[x]
SalaryIndex_out = smap["SalaryIndex"]
EmployeeName_out = smap["EmployeeName"]
DepartmentName_out = smap["DepartmentName"]
EmployeeSince_out = smap["EmployeeSince"]

## Generate the output row
generateRow()

Step 8. Run the mapping.

If the output fields in the Python transformation are linked directly to a Target transformation, the target contains the following data after you run the mapping:

DepartmentName	DepartmentID	EmployeeName	SalaryIndex	EmployeeSince
Finance	3	Concor Valashe	230	11/22/2007
Marketing	4	Manchini Voliore	800	5/17/2009
HR	1	Blaze Concave	501	8/25/2016
R&D	2	Janet Encarr	890	1/26/2019