Transformations > Expression transformation > Window functions
  

Window functions

In advanced mode, you can use a window function to concisely express stateful computations. A window function takes a small subset of a larger data set for processing and analysis.
Window functions operate on a group of rows and calculate a return value for every input row.
Use window functions to perform the following tasks:
Before you define a window function, configure the following window properties on the Window tab:
Frame
Defines the rows that are included in the frame for the current input row, based on physical offsets from the position of the current input row.
You configure a frame if you use an aggregate function as a window function. The window functions LEAD and LAG reference individual rows and ignore the frame.
Partition Keys
Separates the input rows into different partitions.
If you do not define partition keys, all rows belong to a single partition.
Order Keys
Defines how rows in a partition are ordered.
The fields you choose determine the position of a row within a partition. The order key can be ascending or descending. If you do not define order keys, the rows have no particular order.
You cannot parameterize an expression that contains a window function. If the expression is parameterized, you cannot specify a window function in the mapping task.

Frame

The frame determines which rows are included in the calculation for the current input row based on the rows' relative position to the current row. Configure a frame if you use an aggregate function as a window function.
The start offset and end offset describe the number of rows that appear before and after the current input row. An offset of "0" represents the current input row. For example, a start offset of -3 and an end offset of 0 describe a frame including the current input row and the three rows before the current row.
The following image shows a frame with a start offset of -1 and an end offset of 1:
The Sports row is the current input row. The Arcade, Sports, and Adventure rows are part of the window and have corresponding revenues of 1000, 2000, and 3000.
For every input row, the function performs an aggregate operation on the rows inside the frame. If you configure an aggregate expression like SUM on the frame in the previous image, the expression calculates the sum of the values within the frame and returns a value of 6000 for the input row.
You can also specify a frame that does not include the current input row. For example, a start offset of 10 and an end offset of 15 describe a frame that includes six total rows, from the tenth to the fifteenth row after the current row.
Offsets of All Preceding Rows and All Following Rows represent the first row of the partition and the last row of the partition. For example, if the start offset is All Preceding Rows and the end offset is -1, the frame includes one row before the current row and all rows before that.
The following image shows a frame with a start offset of 0 and an end offset of All Following Rows:
The current input row is Country. All Following Rows include the following rows: Country, Ethnic, Pop, Rock, Classical, EDM, Hip Hop, and Punk.
If the frame offsets are outside the partition, the aggregate function ignores the frame. If the offsets of a frame are not within the partition or table, the aggregate function processes only the rows within the partition. The aggregate function lists the skipped rows in the log file. The function returns one of the following responses for the skipped rows: a default value of ERROR('transformation error'), NULL, or a predefined constant.
For example, you partition a table by seller ID and you order by quantity. You set the start offset to -3 and the end offset to 4.
The following image shows the partition and frame for the current input row:
The row with seller ID of 3 and quanity of 15 is the current input row. The rows with seller IDS of 3 and quantity of 10, 15, 20, and 30 are part of the window. Rows with seller IDs of 2,3, and 4 are part of the frame.
The frame includes eight total rows, but the calculation remains within the partition. If you define an AVG function with this frame, the function calculates the average of the quantities inside the partition and returns 18.75.
Consider the following rules and guidelines when you define a frame:

Partition and order keys

Configure partition and order keys to form groups of rows and define the order or sequence of rows within each partition.
Use the following keys to group and order the rows in a window:
Partition keys
Configure partition keys to define partition boundaries rather than performing the calculation across all rows.
If you do not specify partition keys, all the data is included in the same partition.
Order keys
Use order keys to determine how rows in a partition are ordered. Order keys define the position of a particular row in a partition.
You must also arrange the data in ascending or descending order. If you do not specify order keys, the rows in a partition are arranged randomly.
Consider the following rules and guidelines when you define window properties for partition and order keys:

Example

You are the owner of a coffee and tea shop. You want to calculate the best-selling and second best-selling coffee and tea products.
The following table lists the products, the corresponding product categories, and the revenue from each product:
Product
Category
Revenue
Espresso
Coffee
600
Black
Tea
550
Cappuccino
Coffee
500
Americano
Coffee
600
Oolong
Tea
250
Macchiato
Coffee
300
Green
Tea
450
White
Tea
650
You partition the data by category and order the data by descending revenue.
The following table shows the data grouped into two partitions according to category. Within each partition, the revenue is organized in descending order:
Product
Category
Revenue
Espresso
Coffee
600
Americano
Coffee
600
Cappuccino
Coffee
500
Macchiato
Coffee
300
White
Tea
650
Black
Tea
550
Green
Tea
450
Oolong
Tea
250
You can run the MAX function within each partition to determine that the two best-selling coffees are espresso and Americano, and the two best-selling teas are white and black.