Windowing
When a transformation contains a window function, you need to configure the windowing properties. Windowing is available for transformations on the Spark engine only.
Window functions operate on a group of rows and calculate a return value for every input row.
Before you define a window function in an Expression transformation, you need to describe the window by configuring the windowing properties. Windowing properties include a frame specification, partition keys, and order keys. The frame specification states which rows are included in the overall calculation for the current row. The partition keys determine which rows are in the same partition. The order keys determine how rows in a partition are ordered.
After you configure windowing properties, you define a window function in the Expression transformation. Informatica supports the window functions LEAD and LAG. You can also use aggregate functions as window functions in an Expression transformation.
Windowing Configuration
When you include a window function in an Expression transformation, you configure the windowing properties associated with the function. Windowing properties define the partitioning, ordering, and frame boundaries associated with a particular input row.
Configure a transformation for windowing on the Windowing tab.
You configure the following groups of properties on the Windowing tab:
- Frame
- Defines the rows that are included in the frame for the current input row, based on physical offsets from the position of the current input row.
You configure a frame if you use an aggregate function as a window function. The window functions LEAD and LAG reference individual rows and ignore the frame specification.
- Partition Keys
- Separate the input rows into different partitions. If you do not define partition keys, all rows belong to a single partition.
- Order Keys
- Define how rows in a partition are ordered. The ports you choose determine the position of a row within a partition. The order key can be ascending or descending. If you do not define order keys, the rows have no particular order.
Frame
The frame determines which rows are included in the calculation for the current input row, based on their relative position to the current row.
If you use an aggregate function instead of LEAD or LAG, you must specify a window frame. LEAD and LAG reference individual row sand ignore the frame specification.
The start offset and end offset describe the number of rows that appear before and after the current input row. An offset of "0" represents the current input row. For example, a start offset of -3 and an end offset of 0 describes a frame including the current input row and the three rows before the current row.
The following image shows a frame with a start offset of -1 and an end offset of 1:
For every input row, the function performs an aggregate operation on the rows inside the frame. If you configure an aggregate expression like SUM with the preceding frame, the expression calculates the sum of the values within the frame and returns a value of 6000 for the input row.
You can also specify a frame that does not include the current input row. For example, a start offset of 10 and an end offset of 15 describes a frame that includes six total rows, from the tenth to the fifteenth row after the current row.
Note: The start offset must be less than or equal to the end offset.
Offsets of All Rows Preceding and All Rows Following represent the first row of the partition and the last row of the partition. For example, if the start offset is All Rows Preceding and the end offset is -1, the frame includes one row before the current row and all rows before that.
Partition and Order Keys
Configure partition and order keys to form groups of rows and define the order or sequence of rows within each partition.
Use the following keys to specify how to group and order the rows in a window:
- Partition keys
- Configure partition keys to define partition boundaries, rather than performing the calculation across all inputs. The window function operates across the rows that fall into the same partition as the current row.
You can specify the partition keys by value or parameter. Select Value to use port names. Choose Parameter to use a sort key list parameter. A sort key list parameter contains a list of ports to sort by. If you do not specify partition keys, all the data is included in the same partition.
- Order keys
- Use order keys to determine how rows in a partition are ordered. Order keys define the position of a particular row in a partition.
You can specify the order keys by value or parameter. Select Value to use port names. Choose Parameter to use a sort key list parameter. A sort key list parameter contains a list of ports to sort by. You must also choose to arrange the data in ascending or descending order. If you do not specify order keys, the rows in a partition are not arranged in any particular order.
Example
You are the owner of a coffee and tea shop. You want to calculate the best-selling and second best-selling coffee and tea products.
The following table lists the products, the corresponding product categories, and the revenue from each product:
Product | Category | Revenue |
---|
Espresso | Coffee | 600 |
Black | Tea | 550 |
Cappuccino | Coffee | 500 |
Americano | Coffee | 600 |
Oolong | Tea | 250 |
Macchiato | Coffee | 300 |
Green | Tea | 450 |
White | Tea | 650 |
|
You partition the data by category and order the data by descending revenue.
The following table shows the data grouped into two partitions according to category. Within each partition, the revenue is organized in descending order:
Product | Category | Revenue |
---|
Espresso | Coffee | 600 |
Americano | Coffee | 600 |
Cappuccino | Coffee | 500 |
Macchiato | Coffee | 300 |
White | Tea | 650 |
Black | Tea | 550 |
Green | Tea | 450 |
Oolong | Tea | 250 |
|
Based on the partitioning and ordering specifications, you determine that the two best-selling coffees are espresso and Americano, and the two best-selling teas are white and black.
Rules and Guidelines for Windowing Configuration
Certain guidelines apply when you configure a transformation for windowing.
Consider the following rules and guidelines when you define windowing properties for a window function:
- •When you configure a frame, the start offset must be less than or equal to the end offset. Otherwise, the frame is not valid.
- •Configure a frame specification if you use an aggregate function as a window function. LEAD and LAG operate based on the offset value and ignore the frame specification.
- •You cannot use complex ports as partition or order keys.
- •You cannot preview the data in a transformation configured for windowing.
- •Assign unique port names to partition and order keys to avoid run-time errors.
- •The partition and order keys cannot use both a dynamic port and one or more generated ports of the same dynamic port. You must select either the dynamic port or the generated ports.