Group By Ports
You can define groups of rows to aggregate instead of running an aggregation across all the input data. For example, you can calculate the total company sales or you can find the total sales grouped by region.
To define a group for the aggregate expression, select the appropriate input, input/output, output, and variable ports in the Aggregator transformation. You can select multiple group by ports to create a new group for each unique combination. The Data Integration Service then performs the defined aggregation for each group.
When you group values, the Data Integration Service produces one row for each group. If you do not group values, the Data Integration Service returns one row for all input rows. The Data Integration Service returns the last row of each group with the result of the aggregation. You can specify to return a specific row. For example, if you use the FIRST aggregator function, the Data Integration Service returns the first row.
When you select multiple group by ports in the Aggregator transformation, the Data Integration Service uses a port order to determine the order by which it groups. The group order can affect the results. Order the group by ports to ensure the appropriate grouping. You can change the port order after you select the ports in the group/
For example, you might create an output port called Price_Out. The expression for Price_Out is SUM (Qty * Price). You define Store_ID and Item as the group by ports. The transformation returns the total price for each item by store.
The input rows might contain the following data:
Store_ID | Item | Qty | Price |
---|
101 | battery | 3 | 2.99 |
101 | battery | 1 | 3.19 |
101 | battery | 2 | 2.59 |
101 | AAA | 2 | 2.45 |
201 | battery | 1 | 1.99 |
201 | battery | 4 | 1.59 |
301 | battery | 1 | 2.45 |
The Data Integration Service performs the aggregate calculation on the following unique groups:
Store_Id | Item |
---|
101 | battery |
101 | AAA |
201 | battery |
301 | battery |
The Data Integration Service returns the Store_ID, Item, Qty, and Price from the last row with the sum of (Price * Qty) for each item by store :
Store_ID | Item | Qty | Price | Price_Out |
---|
101 | battery | 2 | 2.59 | 17.34 |
101 | AAA | 2 | 2.45 | 4.90 |
201 | battery | 4 | 1.59 | 8.35 |
301 | battery | 1 | 2.45 | 2.45 |
Configure Group By Ports
Define the group by ports on the Group By tab of the transformation Properties view.
The following image shows the Group By tab:
The Group By tab contains the following options:
- Specify by
- Select Value or Parameter. Select Value to use port names. Choose Parameter to use a port list parameter.
- Add
- Accepts a port name that you type in manually. You must type a valid port name before you click Add.
- Choose
- Click Choose to select ports to add to the group. The Developer tool provides a list of ports from the transformation to choose from.
- Move Up and Move Down
- You can change the order of the ports in the group. Select the port name and then click one of the move buttons to move it up or down in the sort order.
Group By Parameters
You can configure a port list parameter that contains one or more ports to include in the group. Create a port list parameter by selecting ports from a list of the ports in the transformation.
The following image shows the Group By tab when you use a parameter to identify the ports in the group:
You can browse for a port list parameter or click New to create a port list parameter. If you choose to create a port list parameter, you can select the ports from a list of the ports in the transformation.
Default Values of Group By Ports
The Data Integration Service does not create a group when a group by port contains null values. You can define a default value for each port in the group to replace any null input value. Then, the Data Integration Service can include the rows in the aggregation totals.
Non-Aggregate Expressions
Use non-aggregate expressions in group by ports to modify or replace groups.
For example, if you want to replace ‘AAA battery’ before grouping, you can create a group by output port, named CORRECTED_ITEM, using the following expression:
IIF( ITEM = 'AAA battery', battery, ITEM )