Developer Transformation Guide > Expression Transformation > Windowing
  

Windowing

When a transformation contains a window function, you need to configure the windowing properties. Windowing is available for transformations on the Spark engine only.
Window functions operate on a group of rows and calculate a return value for every input row.
Before you define a window function in an Expression transformation, you need to describe the window by configuring the windowing properties. Windowing properties include a frame specification, partition keys, and order keys. The frame specification states which rows are included in the overall calculation for the current row. The partition keys determine which rows are in the same partition. The order keys determine how rows in a partition are ordered.
After you configure windowing properties, you define a window function in the Expression transformation. Informatica supports the window functions LEAD and LAG. You can also use aggregate functions as window functions in an Expression transformation.

Windowing Configuration

When you include a window function in an Expression transformation, you configure the windowing properties associated with the function. Windowing properties define the partitioning, ordering, and frame boundaries associated with a particular input row.
Configure a transformation for windowing on the Windowing tab.
The Windowing tab shows three configuration sections. The frame section has a start offset and end offset selection box, with "All Rows Preceding" and "All Rows Following" checkboxes. You can specify the order keys and partition keys by value or parameter, and choose ports from the transformation. You can choose Ascending or Descending for the order key.
You configure the following groups of properties on the Windowing tab:
Frame
Defines the rows that are included in the frame for the current input row, based on physical offsets from the position of the current input row.
You configure a frame if you use an aggregate function as a window function. The window functions LEAD and LAG reference individual rows and ignore the frame specification.
Partition Keys
Separate the input rows into different partitions. If you do not define partition keys, all rows belong to a single partition.
Order Keys
Define how rows in a partition are ordered. The ports you choose determine the position of a row within a partition. The order key can be ascending or descending. If you do not define order keys, the rows have no particular order.

Frame

The frame determines which rows are included in the calculation for the current input row, based on their relative position to the current row.
If you use an aggregate function instead of LEAD or LAG, you must specify a window frame. LEAD and LAG reference individual row sand ignore the frame specification.
The start offset and end offset describe the number of rows that appear before and after the current input row. An offset of "0" represents the current input row. For example, a start offset of -3 and an end offset of 0 describes a frame including the current input row and the three rows before the current row.
The following image shows a frame with a start offset of -1 and an end offset of 1:
The table has five rows. The middle row is the current input row, and the frame includes one row before the current row and one row after the current row.
For every input row, the function performs an aggregate operation on the rows inside the frame. If you configure an aggregate expression like SUM with the preceding frame, the expression calculates the sum of the values within the frame and returns a value of 6000 for the input row.
You can also specify a frame that does not include the current input row. For example, a start offset of 10 and an end offset of 15 describes a frame that includes six total rows, from the tenth to the fifteenth row after the current row.
Note: The start offset must be less than or equal to the end offset.
Offsets of All Rows Preceding and All Rows Following represent the first row of the partition and the last row of the partition. For example, if the start offset is All Rows Preceding and the end offset is -1, the frame includes one row before the current row and all rows before that.
The frame includes the current input row and all rows below the current input row.

Partition and Order Keys

Configure partition and order keys to form groups of rows and define the order or sequence of rows within each partition.
Use the following keys to specify how to group and order the rows in a window:
Partition keys
Configure partition keys to define partition boundaries, rather than performing the calculation across all inputs. The window function operates across the rows that fall into the same partition as the current row.
You can specify the partition keys by value or parameter. Select Value to use port names. Choose Parameter to use a sort key list parameter. A sort key list parameter contains a list of ports to sort by. If you do not specify partition keys, all the data is included in the same partition.
Order keys
Use order keys to determine how rows in a partition are ordered. Order keys define the position of a particular row in a partition.
You can specify the order keys by value or parameter. Select Value to use port names. Choose Parameter to use a sort key list parameter. A sort key list parameter contains a list of ports to sort by. You must also choose to arrange the data in ascending or descending order. If you do not specify order keys, the rows in a partition are not arranged in any particular order.

Example

You are the owner of a coffee and tea shop. You want to calculate the best-selling and second best-selling coffee and tea products.
The following table lists the products, the corresponding product categories, and the revenue from each product:
Product
Category
Revenue
Espresso
Coffee
600
Black
Tea
550
Cappuccino
Coffee
500
Americano
Coffee
600
Oolong
Tea
250
Macchiato
Coffee
300
Green
Tea
450
White
Tea
650
You partition the data by category and order the data by descending revenue.
The order key is ascending revenue. The partition key is category.
The following table shows the data grouped into two partitions according to category. Within each partition, the revenue is organized in descending order:
Product
Category
Revenue
Espresso
Coffee
600
Americano
Coffee
600
Cappuccino
Coffee
500
Macchiato
Coffee
300
White
Tea
650
Black
Tea
550
Green
Tea
450
Oolong
Tea
250
Based on the partitioning and ordering specifications, you determine that the two best-selling coffees are espresso and Americano, and the two best-selling teas are white and black.

Rules and Guidelines for Windowing Configuration

Certain guidelines apply when you configure a transformation for windowing.
Consider the following rules and guidelines when you define windowing properties for a window function: