Big Data Management User Guide > Stateful Computing on the Spark Engine > Window Functions
  

Window Functions

Window functions calculate a return value for every input row of a table, based on a group of rows.
A window function performs a calculation across a set of table rows that are related to the current row. You can also perform this type of calculation with an aggregate function. But unlike regular aggregate functions, a window function does not group rows into a single output row. The rows retain unique identities.
You can define the LEAD and LAG analytic window functions in an Expression transformation. LEAD and LAG give access to multiple rows within a table, without the need for a self-join.

LEAD

The LEAD function returns data from future rows.
LEAD uses the following syntax:
LEAD ( Column name, Offset, Default )
LEAD returns the value at an offset number of rows after the current row. Use the LEAD function to compare values in the current row with values in a following row. Use the following arguments with the LEAD function:
For more information about the LEAD function, see the Informatica Transformation Language Reference.

LAG

The LAG function returns data from preceding rows.
LAG uses the following syntax:
LAG ( Column name, Offset, Default )
LAG returns the value at an offset number of rows before the current row. Use the LAG function to compare values in the current row with values in a previous row. Use the following arguments with the LAG function:
For more information about the LAG function, see the Informatica Transformation Language Reference.

Aggregate Functions as Window Functions

In addition to LEAD and LAG, you can also use aggregate functions as window functions. When you use aggregate functions like SUM and AVG as window functions, you can perform running calculations that are similar to the stateful functions MOVINGSUM, MOVINGAVG, and CUME. Window functions are more flexible than stateful functions because you can set a specific end offset.
To use an aggregate function as a window function, you must define a frame in the windowing properties. You define a frame to limit the scope of the calculation. The aggregate function performs a calculation across the frame and produces a single value for each row.

Example

You are a lumber salesperson who sold different quantities of wood over the past two years. You want to calculate a running total of sales quantities.
The following table lists each sale ID, the date, and the quantity sold:
Sale_ID
Date
Quantity
30001
2016-08-02
10
10001
2016-12-24
10
10005
2016-12-24
30
40001
2017-01-09
40
10006
2017-01-18
10
20001
2017-02-12
20
A SUM function adds all the values and returns one output value. To get a running total for each row, you can define a frame for the function boundaries.
The start offset is all rows preceding and the end offset is 0. The order key is ascending date. The partition key is not specified.
You configure the following windowing properties:
You define the following aggregate function:
SUM (Quantity)
SUM adds the quantity in the current row to the quantities in all the rows preceding the current row. The function returns a running total for each row.
The following table lists a running sum for each date:
Sale_ID
Date
Quantity
Total
30001
2016-08-02
10
10
10001
2016-12-24
10
20
10005
2016-12-24
30
50
40001
2017-01-09
40
90
10006
2017-01-18
10
100
20001
2017-02-12
20
120

Aggregate Offsets

An aggregate function performs a calculation on a set of values inside a partition. If the frame offsets are outside the partition, the aggregate function ignores the frame.
If the offsets of a frame are not within the partition or table, the aggregate function calculates within the partition. The function does not return NULL or a default value.
For example, you partition a table by seller ID and you order by quantity. You set the start offset to -3 and the end offset to 4.
The following image shows the partition and frame for the current input row:
The partition includes the current input row plus one row before the current row and two rows after the current row. The frame extends to three rows before the current row and four rows after the current row.
The frame includes eight total rows, but the calculation remains within the partition. If you define an AVG function with this frame, the function calculates the average of the quantities inside the partition and returns 18.75.

Nested Aggregate Functions

A nested aggregate function in a window function performs a calculation separately for each partition. A nested aggregate function behaves differently in a window function and an Aggregator transformation.
A nested aggregate function in an Aggregator transformation performs a calculation globally across all rows. A nested aggregate function in an Expression transformation performs a separate calculation for each partition.
For example, you configure the following nested aggregate function in an Aggregator transformation and an Expression transformation:
MEDIAN ( COUNT ( P1 ) )
You define P2 as the group by port in the Aggregator transformation.
The function takes the median of the count of the following set of data:
P1
P2
10
1
7
1
12
1
11
2
13
2
8
2
10
2
RETURN VALUE: 3.5
When you include nested aggregate functions in an Expression transformation and configure the transformation for windowing, the function performs the calculation separately for each partition.
You partition the data by P2 and specify a frame of all rows preceding and all rows following. The window function performs the following calculations:
  1. 1. COUNT (P1) produces one value for every row. COUNT returns the number of rows in the partition that have non-null values.
  2. 2. MEDIAN of that value produces the median of a window of values generated by COUNT.
The window function produces the following outputs:
P1
P2
Output
10
1
3
7
1
3
12
1
3
11
2
4
13
2
4
8
2
4
10
2
4
You can nest aggregate functions with multiple window functions. For example:
LAG ( LEAD( MAX( FIRST ( p1 )))
Note: You can nest any number of the window functions LEAD and LAG, but you cannot nest more than one aggregate function within another aggregate function.

Rules and Guidelines for Window Functions

Certain guidelines apply when you use window functions on the Spark engine.
Consider the following rules and guidelines when you define window functions in a transformation: