Example: Use partitions to find the highest salary
You are an HR staff member at your organization. You are working on a project to model how employee salaries are associated with aspects of life that employees find important. The project is part of the wellness program at your organization. You want to use the information to better personalize the wellness program.
You can use the Python transformation to determine which employee earns the highest salary in their department.
The following table shows the data that your organization might collect:
DepartmentName
DepartmentID
EmployeeName
SalaryIndex
EmployeeSince
HR
1
Jane Smith
500
2/16/2010
R&D
2
Ellioth Consar
150
3/29/2018
Finance
3
Concor Valashe
230
11/22/2007
Marketing
4
Manchini Voliore
800
5/17/2009
HR
1
Blaze Concave
501
8/25/2016
R&D
2
Janet Encarr
890
1/26/2019
HR
1
Chelsea Blanch
389
9/3/2018
R&D
1
Samuel Coin
10
1/26/2005
To use the Python transformation to determine which employee earns the highest salary in their department, perform the following tasks:
Step 1. Add a Python transformation to the mapping.
Create a Python transformation. On the Advanced tab, set the behavior to Active.
Step 2. Pass data to the Python transformation.
Pass the following fields from upstream transformations in the mapping to the Python transformation:
- DepartmentName
- DepartmentID
- EmployeeName
- SalaryIndex
- EmployeeSince
Step 3. Partition the data by department.
Partition the data by department to track the highest salary within each department. To partition the data by department, add the incoming field DepartmentID as a partition key on the Partition Keys tab.
Step 4. Create output fields.
Create the following output fields on the Output Fields tab to pass data to downstream transformations:
- DepartmentName_out
- DepartmentID_out
- EmployeeName_out
- SalaryIndex_out
- EmployeeSince_out
Step 5. Initialize a map.
Declare a map variable outputmap to associate each department ID with the employee in the department who has the highest salary.
Add the following code in the Pre-Partition Python Code section:
print("Using partitions to find the employee with the highest salary") outputmap = {}
Step 6. Define code to process the data.
For each input row that passes through the Python transformation, define code that checks if the salary of the employee is higher than the maximum salary of the previous rows that have been processed. If the salary of the employee is higher, update the employee who has the maximum salary in the department.
Add the following code in the Main Python Code section:
DepartmentID_out = DepartmentID print("Processing rows for department ID " + str(DepartmentID_out))
if outputmap.get(DepartmentID, None) is None: updateMax = True else: max_salary = outputmap[DepartmentID]['SalaryIndex'] if max_salary is None: updateMax = True if SalaryIndex > max_salary: updateMax = True
if updateMax == True: employee_data = {'SalaryIndex':SalaryIndex,'EmployeeName':EmployeeName, 'EmployeeSince':EmployeeSince,'DepartmentName':DepartmentName}
outputmap[DepartmentID] = employee_data
Step 7. Write the data to the output files.
In the Post-Partition Python Code section of the Python tab, use the data in the map variable outputmap to generate a row for the employee that has the highest salary in each department.
Add the following code in the Post-Partition Python Code section:
for x in outputmap: DepartmentID_out = x smap = outputmap[x] SalaryIndex_out = smap["SalaryIndex"] EmployeeName_out = smap["EmployeeName"] DepartmentName_out = smap["DepartmentName"] EmployeeSince_out = smap["EmployeeSince"]
## Generate the output row generateRow()
Step 8. Run the mapping.
If the output fields in the Python transformation are linked directly to a Target transformation, the target contains the following data after you run the mapping: