Transformations > Hierarchy Processor transformation > Processing hierarchical output
  

Processing hierarchical output

The Hierarchy Processor transformation can process the following types of hierarchical output:

Defining hierarchical output with the Hierarchy Processor transformation

You can map the incoming fields to output fields to configure the output data structure for hierarchical output.
When the output data format is hierarchical, consider the following guidelines:
The following image shows the Hierarchy Processor tab with relational input and hierarchical output:
The Hierarchy Processor tab appears with Output Format Hierarchical selected. The transformation contains the incoming groups and fields in the left panel and an output group and fields in the right panel. The Data Configuration icons, links to output fields, and links to expressions provide ways to define the data processing strategy.
  1. 1Output format. Select Hierarchical to construct a hierarchical schema for incoming fields.
  2. 2Input groups, incoming fields. Use these fields to map to the output fields.
  3. 3Output fields. Use these fields to create the complex output file.
  4. 4Add incoming field to output group. Use to add fields to the output group.
  5. 5Output field names. Click on a field name to modify the field name or data type.
  6. 6Data Configuration icons. Use to configure the output groups and fields.
  7. 7Expression. Click on an expression to view or customize the output field expression.
  8. 8Add or delete output field. Use to create or delete output fields.
Tip: You can resize the Incoming Fields or Output Fields panels to better see the information.

Define hierarchical output

To define a Hierarchy Processor transformation with hierarchical output, perform the following tasks:
    1Select the Hierarchical output format.
    2 Configure the transformation output by adding incoming fields to the output or by manually adding fields to the output.
    3Configure the output groups and fields.
    1. aConfigure data sources.
    2. bOptionally, join data sources.
    3. cOptionally, customize the output field expressions.
    4. dOptionally, add filters to the transformation.
    5. eOptionally, define output order using order by fields.
    6. fOptional, aggregate input data using group by fields.
    7. gOptionally, aggregate output data.

Order of operations

You should perform operations in a certain order when working with hierarchical output.
When the output data format is hierarchical, perform the operations in the following order:
  1. 1Join. Specify join conditions in the data sources.
  2. 2Filter. Optionally, specify filter conditions to include only a subset of the source data.
  3. 3Group By. Optionally, specify the incoming fields for aggregate expressions.
  4. 4Order By. Optionally, specify the incoming fields to create sorted output.

Renaming and deleting output groups

When the output data format is hierarchical, you can rename or delete input groups.
When you modify the input group, the following configurations are also updated if they refer to the input group:
However, you must modify the following configurations manually:
Note: When the output data format is hierarchical, you cannot change the input group name.

Adding incoming fields to hierarchical output groups

You can add individual incoming fields or all fields in an input group to the output group, You can add incoming fields to an output group or to an array or struct field. You can rename or delete output fields as needed.
Click the Add link on the Hierarchy Processor tab next to the incoming field or input group that you want to add.
You can add fields in the following ways based on the incoming field's data type and the output type:
Add incoming field
Adds the selected incoming field to the selected output group or field.
The "Add incoming field" option is available when you add incoming fields with primitive data types to the output.
Add input group
Adds all incoming fields in the group to the selected output group or field.
The "Add input group" option is available when you add incoming fields with primitive data types to the output.
Add primitive single occurring children
Adds all primitive single occurring children under the field to the output, including the primitive, single occurring child fields that are nested under a struct. Adding primitive single occurring children doesn't add children that are nested under arrays.
You can add primitive single occurring children when you select an incoming struct or array field, and the output is hierarchical. For more information, see Add primitive single occurring children.
Preserve incoming field
Preserves the hierarchical structure of the selected field in the output. For example, if you add an array of structs to the output group, then the output group contains an array of structs with the same structure.
The "Preserve incoming field" option is available when you select an incoming struct or array field, and the output is hierarchical. For more information, see Preserve incoming fields.
Flatten selected array
Flattens an array of primitives into a primitive field. Creates one output record for each element in the array.
The "Flatten selected array" option is available when you select an incoming struct or array field, and the output is hierarchical. For more information, see Flatten selected array.
Add selected array as struct
Flattens an array of structs into a struct field. Creates one output record for each element the array.
The "Add selected array as struct" option is available when you select an incoming struct or array field, and the output is hierarchical. For more information, see Add selected array as struct.

Add primitive single occurring children

You can add primitive single occurring children when you are adding incoming struct and array fields to the output. The "Add primitive single occurring children" option is available when you add incoming struct or array fields to hierarchical output. It adds all primitive single occurring children under the field to the output group, including the primitive, single occurring child fields that are nested under a struct.
You can add primitive single occurring children only for incoming fields with child objects.
If the field you select contains an array, the array and its children are not added because an array can contain multiple elements.

Example

You want to extract the make, model, company, and policy number from an array of vehicle records. Each output record should contain information about one vehicle.
Add the vehicle array to the output group and choose Add Primitive Single Occurring Children.
The following image shows the incoming and output fields:
The image shows the Hierarchy Processor tab of the Hierarchy Processor transformation. The Incoming Fields panel contains one input group with the following field: vehicle (array). The vehicle array contains the following fields: make (string), model (string) insurance (struct), and maintenance (array). The insurance struct contains the following fields: company (string) and policy_num (string). The maintenance array contains the following fields: date (string) and description (array of strings). The Output Fields panel contains the following primitive fields: make (string), model (string), company (string), and policy_num (string).
Note that the date field is not added to the output group because it is under a child array of the selected field and is not single occurring.

Preserve incoming fields

When you add incoming struct and array fields to the output, you can preserve the incoming field. When you preserve the incoming field, it's copied to hierarchical output without changing the structure.
When you add a nested array and select "Preserve incoming fields", the data source configuration for the output group determines how the records are created.

Example of preserving incoming fields

You want to extract the description information from a nested array of maintenance records. The description information is in an array of strings. You want the output data to also be in an array of strings.
Add the description array to the output group and choose Preserve Incoming Field.
The following image shows the incoming and output fields:
The image shows the Hierarchy Processor tab of the Hierarchy Processor transformation. The Incoming Fields panel contains one input group with the following field: vehicle (array). The vehicle array contains the following fields: make (string), model (string) insurance (struct), and maintenance (array). The insurance struct contains the following fields: company (string) and policy_num (string). The maintenance array contains the following fields: date (string) and description (array of strings). The Output Fields panel contains the following field: description (array of strings).
When Data Integration creates the output array, it sets the data source for the description array to Input.vehicle.vehicle.maintenance.maintenance.desc.elem, indicating that the information for the output array comes from the elements in the desc array in the Input group. The data source for the Output group determines the structure of the output records.
By default, Data Integration sets the data source for the Output group to Input. When you run the mapping, Data Integration collates all descriptions into one output record. To write the descriptions to separate records for each vehicle, set the data source to Input.vehicle.vehicle. To write the descriptions to separate records for each maintenance record, set the data source to Input.vehicle.vehicle.maintenance.maintenance. For more information about data source configuration, see Configuring data sources.

Flatten selected array

When you add an incoming array of primitives to the output, you can flatten the selected array into a field of the same data type.
When you flatten a selected array, it creates one record for each element in the array.

Example of flattening an array

You want to extract the description information from a nested array of maintenance records. The description information is in an array of strings. You want to flatten the output into a string field.
Add the description array to the output group and choose Flatten Selected Array.
The following image shows the incoming and output fields:
The image shows the Hierarchy Processor tab of the Hierarchy Processor transformation. The Incoming Fields panel contains one group with the following field: vehicle (array). The vehicle array contains the following fields: make (string), model (string) insurance (struct), and maintenance (array). The insurance struct contains the following fields: company (string) and policy_num (string). The maintenance array contains the following fields: date (string) and description (array of strings). The Output Fields panel contains the following field: description (string).
The output contains one record for each occurrence of description in the incoming data.
For example, the incoming data contains the following record:
[
{
"vehicle": [
{
"make": "Toyota",
"model": "Corolla",
"insurance": {
"company": "Allstate",
"policy_num": "AS12876"
},
"maintenance": [
{
"date": "01/01/2020",
"description": ["oil filter1", "oil filter2"]
},
{
"date": "01/08/2020",
"description": ["tire rotation1", "tire rotation2"]
}
]
},
{
"make": "Toyota",
"model": "RAV4",
"insurance": {
"company": "Allstate",
"policy_num": "AS2033"
},
"maintenance": [
{
"date": "01/02/2020",
"description": ["air filter replacement1", "air filter replacement2"]
},
{
"date": "01/08/2020",
"description": ["battery replacement1", "battery replacement2"]
}
]
}
]
}
]
Data Integration creates the following output records, one for each occurrence of description in the incoming data:
{"description":"oil filter1"}
{"description":"oil filter2"}
{"description":"tire rotation1"}
{"description":"tire rotation2"}
{"description":"air filter replacement1"}
{"description":"air filter replacement2"}
{"description":"battery replacement1"}
{"description":"battery replacement2"}

Add selected array as struct

When you add an incoming array of primitives as a struct, the array is flattened into a struct field, without flattening any descendant arrays.
The "Add selected array as struct" option creates one record for each element in the selected array.

Example of adding an array as struct

You want to flatten an array of vehicle records into a struct without flattening the child array of maintenance records.
Add the vehicle array to the output group and choose Add Selected Array as Struct.
The following image shows the incoming and output fields:
The image shows the Hierarchy Processor tab of the Hierarchy Processor transformation. The Incoming Fields panel contains one input group with the following field: vehicle (array). The vehicle array contains the following fields: make (string), model (string) insurance (struct), and maintenance (array). The insurance struct contains the following fields: company (string) and policy_num (string). The maintenance array contains the following fields: date (string) and description (array of strings). The Output Fields panel contains the following field: vehicle (struct). The vehicle struct contains the following fields: make (string), model (string) insurance (struct), and maintenance (array). The insurance struct contains the following fields: company (string) and policy_num (string). The maintenance array contains the following fields: date (string) and description (array of strings).
The output contains one record for each element in the vehicle array.
For example, the incoming data contains the following record which contains data about two vehicles:
[
{
"vehicle": [
{
"make": "Toyota",
"model": "Corolla",
"insurance": {
"company": "Allstate",
"policy_num": "AS12876"
},
"maintenance": [
{
"date": "01/01/2020",
"description": ["oil filter1", "oil filter2"]
},
{
"date": "01/08/2020",
"description": ["tire rotation1", "tire rotation2"]
}
]
},
{
"make": "Toyota",
"model": "RAV4",
"insurance": {
"company": "Allstate",
"policy_num": "AS2033"
},
"maintenance": [
{
"date": "01/02/2020",
"description": ["air filter replacement1", "air filter replacement2"]
},
{
"date": "01/08/2020",
"description": ["battery replacement1", "battery replacement2"]
}
]
}
]
}
]
The Hierarchy Processor transformation creates the following output records, one for each vehicle:
{
"vehicle":{
"make":"Toyota",
"model":"Corolla",
"insurance":{
"company":"Allstate",
"policy_num":"AS12876"
},
"maintenance":[
{
"date":"01/08/2020",
"description":["tire rotation2","tire rotation1"]
},
{
"date":"01/01/2020",
"description":["oil filter2","oil filter1"]
}
]
}
}
{
"vehicle":{
"make":"Toyota",
"model":"RAV4",
"insurance":{
"company":"Allstate",
"policy_num":"AS2033"
},
"maintenance":[
{
"date":"01/08/2020",
"description":["battery replacement2","battery replacement1"]
},
{
"date":"01/02/2020",
"description":["air filter replacement2","air filter replacement1"]
}
]
}
}

Configuring hierarchical output group and fields

Uou can create and modify the output group and output fields for hierarchical output.
After you add incoming fields to an output group, you can select an output field name to modify the fields. You can also add and define output fields manually.

Name and Type

The following table describes the properties for hierarchical output fields:
Property
Description
Child Of
The parent field or group that this field belongs to. The name structure describes the group, parent fields, and struct name. For example: OUTGROUP.Grandparent.Parent.struct_name
Name
The name of the current output group or field.
Type
The data type of the current field. For hierarchical output, you can choose either a primitive or complex data type.
Precision
The total number of significant digits in the field.
Scale
The number of digits to the right of the decimal point.
Array Element Type
The data type of the current array element.
Array Element Precision
The precision for the current array element. Used when creating the target data.
Array Element Scale
The scale for the current array element. Used when creating the target data.
Struct Name
The struct name for the current struct field.
Element Struct Name
The element struct name for the current array of structs field.
Description
Optional description of the field and its usage.
Note: The output field properties that appear depend on the data type.

Aggregate Options

All output fields except for parent fields have aggregation options that you can set. The following table describes the aggregate options:
Property
Description
Use this field to aggregate values in an output field array.
Select this property to aggregate output data into the current field.
Output Field
If you are aggregating, specify the sibling array with fields to aggregate. The array must be a child of the current output field or group.

Aggregating values in an output field array

If the Hierarchy Processor transformation output contains an array, you might want to add an output field to aggregate values in the array. The output field must be a sibling of the array field.
For example, the Output group contains an array with orders information. The Orders array contains the OrderPrice field, which stores the price for each order. You want to find the total order price for each company.
The following image shows the output fields:
The image shows the Output Fields panel of the Hierarchy Processor tab of the Hierarchy Processor transformation. The Output Fields panel contains one output group with the following fields: CompanyName (string), Orders (array), and TotalOrderPrice (double). The Orders array struct contains the following fields: OrderPrice (double), OrderDate (string), Items (array), and OrderAddress (struct).
To find the total order price, add a field called TotalOrderPrice to the output group.
Edit the TotalOrderPrice field. Select Use this field to aggregate values in an output field array, and select the Orders array as the output field for which you want to aggregate values. Configure the following expression for the TotalOrderPrice field:
SUM(:fld.{Output.Orders.Orders.OrderPrice})

Configuring data sources

You can configure a data source for the output group and for all array and struct fields in the output. A data source identifies the input group or incoming array that populates the primitive child fields for the output group or field.
If you select a field as a data source, you have access to the following objects:
You can configure multiple data sources for an output group or field. If you do this, you must configure a join condition to join the data.
You can configure filters to exclude certain records. You can also specify group by fields for aggregating the data and order by fields for sorting records.
The data sources for hierarchical output group and fields can vary based on the output data structure.

Inheriting data sources from the parent

When you configure the data source for an array or struct field, you can populate the children of the field either with the incoming data or by inheriting the parent's data sources.
When you use the incoming data, the incoming data is used to populate the children of the array or struct.
When you inherit the parent's data sources, the data transformed into the parent output field populates the children of the array or struct. This preserves the data transformations (for example, joins and filters), in the parent fields. You can apply filters to the field to further filter the data, but you can't configure data sources, joins, group by fields, or order by fields.
For example, you are reading data from a relational table of customer records in which the customer ID is unique. The incoming data contains the following records:
CustID,Name,Street,City,State,ZIP
00234,Ravindra Singh,123 6th St. Apt. 5A,Boston,MA,02134
14416,Melissa Clark,11 Winding Way,Watch Hill,RI,02891
You want to write the customer address fields to a struct.
The following image shows the incoming and output fields:
The image shows the Hierarchy Processor tab of the Hierarchy Processor transformation. The Incoming Fields panel contains one input group with the following string fields: CustID, Name, Street, City, State, and ZIP. The Output Fields panel contains one output group with the following fields: CustID (string), Name (string), and Address (struct). The Address struct contains the following string fields: Street, City, State, and ZIP.
In the Output Fields panel, set the data source for the Output group to Input and the data source for the Address struct to Inherit parent's data sources (Output). When you run the mapping, the Hierarchy Processor transformation creates one record for each occurrence of CustID in the input data and populates the struct with the address data that corresponds to the customer ID in the output:
{
"CustID":"00234",
"Name":"Ravindra Singh",
"Address":{
"Street":"123 6th St. Apt. 5A",
"City":"Boston",
"State":"MA",
"ZIP":"02134"
}
}
{
"CustID":"14416",
"Name":"Melissa Clark",
"Address":{
"Street":"11 Winding Way",
"City":"Watch Hill",
"State":"RI",
"ZIP":"02891"
}
}
If you set the data source for the Address struct to Input, then you must also configure the following filter condition on the struct to get the same output: :fld.{Input.CustID} = :fld.{Output.CustID} AND :fld.{Input.Name} = :fld.{Output.Name}. For more information about configuring filter conditions, see Configure filter conditions.
When the output field is an array that inherits its parent's data, the Hierarchy Processor transformation creates an array with one element.

Configure data source in hierarchical output example

When you add a nested array to the output and preserve the incoming field, the records that get created vary based on how you configure the data source for the output group.
For example, you want to extract the description information from a nested array of maintenance records in a JSON file. The description information is in an array of strings. You want the output data to also be in an array of strings.
The following sample data shows an incoming record:
[
{
"vehicle": [
{
"make": "Toyota",
"model": "Corolla",
"insurance": {
"company": "Allstate",
"policy_num": "AS12876"
},
"maintenance": [
{
"date": "01/01/2020",
"description": ["oil filter1", "oil filter2"]
},
{
"date": "01/08/2020",
"description": ["tire rotation1", "tire rotation2"]
}
]
},
{
"make": "Toyota",
"model": "RAV4",
"insurance": {
"company": "Allstate",
"policy_num": "AS2033"
},
"maintenance": [
{
"date": "01/02/2020",
"description": ["air filter replacement1", "air filter replacement2"]
},
{
"date": "01/08/2020",
"description": ["battery replacement1", "battery replacement2"]
}
]
}
]
}
]
In the Hierarchy Processor transformation, you add the description array to the output group and choose Preserve Incoming Field.
The following image shows the incoming and output fields:
The image shows the Hierarchy Processor tab of the Hierarchy Processor transformation. The Incoming Fields panel contains one input group with the following field: vehicle (array). The vehicle array contains the following fields: make (string), model (string) insurance (struct), and maintenance (array). The insurance struct contains the following fields: company (string) and policy_num (string). The maintenance array contains the following fields: date (string) and description (array of strings). The Output Fields panel contains the following field: description (array of strings).
When the Hierarchy Processor transformation creates the output array, it sets the data source for the description array to Input.vehicle.vehicle.maintenance.maintenance.desc.elem, indicating that the information for the output array comes from the elements in the desc array in the Input group. By default, the Hierarchy Processor transformation sets the data source for the Output group to Input.
The data source configuration for the output group determines how the Hierarchy Processor transformation creates the output records.
To combine all descriptions into one output record, keep the data source for the output group as Input. This produces the following output:
{"description":["battery replacement2","battery replacement1","air filter replacement2","air filter replacement1","tire rotation2","tire rotation1","oil filter2","oil filter1"]}
To create one output record for each occurrence of vehicle in the incoming data, set the data source to Input.vehicle.vehicle. In this case, the output data contains one record for each vehicle:
{"description":["tire rotation2","tire rotation1","oil filter2","oil filter1"]}
{"description":["battery replacement2","battery replacement1","air filter replacement2","air filter replacement1"]}
To create one output record for each occurrence of maintenance in the incoming data, set the data source to Input.vehicle.vehicle.maintenance.maintenance. In this case, the output contains one record for each maintenance record:
{"description":["oil filter2","oil filter1"]}
{"description":["tire rotation2","tire rotation1"]}
{"description":["air filter replacement2","air filter replacement1"]}
{"description":["battery replacement2","battery replacement1"]}

Data source conflicts

If you convert hierarchical data to hierarchical data and you add multi-level arrays to the output, you must ensure that the data sources for the output group or fields do not conflict. A conflict occurs when you select both an incoming array and one of its descendant arrays to be data sources for the same output group or field.
If you have a data source conflict, the transformation remains invalid until you resolve the conflict. Additionally, you cannot configure joins, filters, order by fields, or group by fields until you resolve the conflict.

Example of a data source conflict

A data source conflict occurs if you select both Array1 and Array2 as data sources for the output group in the following image:
The image shows the Hierarchy Processor tab of the Hierarchy Processor transformation. The Incoming Fields panel contains one input group with the following field: Array1 (array). Array1 contains the following fields: Field1 (string), Field2 (string), and Array2 (array). Array2 contains the following fields: Field3 (string) and Field4 (string). The Output Fields panel contains one output group with the same fields as the input group.
The conflict occurs because there is no way to determine which data source provides the data for Field1 and Field2. In this case, the Hierarchy Processor transformation displays a conflicting data sources error.
To resolve the conflict, remove one of the data sources from the Output group.

Modify data sources for output group or fields

The output group and all array and struct fields in the output require a data source. When you add an input group or incoming field to the output, the Hierarchy Processor transformation usually sets a data source for you. You can manually add or edit a data source.
To configure a data source for the output group:
  1. 1Click the Data Sources icon for the output group.
  2. 2Examine if there is already a data source configured for the output group. If it requires any changes, continue with these steps, otherwise you can exit the dialog box.
  3. 3Click the Add Data Source icon to add a new data source.
  4. 4Click the ellipsis icon to select the input group or array to use as a data source.
  5. 5Validate the configuration.
  6. 6Click Save.
Tip: Click the trash icon to remove any incorrect or unneeded data source.
To configure a data source for struct or array fields:
  1. 1Click the Data Sources icon for the output group.
  2. 2Examine if there is already a data source configured for the struct or array. If it requires any changes, continue with these steps, otherwise you can exit the dialog box.
  3. 3Select the type of data source to use:
  4. 4Validate the configuration.
  5. 5Click Save.
Tip: Click the trash icon to remove any incorrect or unneeded data source.

Configuring output data

For hierarchical output, you can configure data source join and filter conditions as well as group by and order by fields. You can aggregate on both the input and output data.

Configure join conditions

When the output data format is hierarchical, you can define join conditions for the data sources. You must configure a join condition if an output group or field has multiple data sources. Configure a join condition to join the data from the input groups or incoming fields.
Configure the join conditions for the output groups on the Hierarchy Processor tab.
    1Click the Join Conditions icon for the output group.
    2Add a join condition.
    3Select the left data source.
    4Select the join type:
    Note: If you select an outer join on a large data set, you might need to increase the Spark driver memory in the mapping task. For more information about Spark session properties, see Tasks.
    5Select the right data source.
    6Click Configure Join Condition.
    7Select fields and built-in functions to create the expression.
    8Validate the expression.
    9Click Save.

Configure filter conditions

You can define filter conditions to project a subset of the input data in the Hierarchy Processor transformation. You can filter based on incoming fields or output fields.
You can configure a filter condition to read data from primitive fields into an output array or struct field when the data in the array or struct field must correspond to the data in a sibling field in the output group.

Filter configuration example

You want to convert relational data to a JSON file. The incoming data is in a relational table that contains orders information. The orders table contains multiple rows for each order because each order can contain several products.
The incoming data looks like the following data:
OrderNumber,ProductName,ProductType,NumberOfItems,PricePerItem
12345,M&Ms Candies Chocolate Peanut Party Size - 38 Oz,Candy,2,14.49
12345,Stella Parm Shredded Cup - 20 Oz,Dairy,1,10.99
12345,AHA Sparkling Water Blueberry Pomegranate - 8-12 Fl. Oz.,Beverages,1,3.33
23456,Weetabix Biscuit Cereal Whole Grain 2 Count - 14 Oz,Breakfast & Cereal,2,4.99
23456,Producers Milk Lowfat 1% - Half Gallon,Dairy,1,2.79
23456,Egglands Best Eggs Cage Free Large Brown - 12 Count,Eggs,1,4.99
You want to read the product details into an array, where the product details are associated with a particular order number.
The following image shows the structure of the incoming and output fields:
The image shows the Hierarchy Processor tab of the Hierarchy Processor transformation. The Incoming Fields panel contains one input group with the following fields: OrderNumber, ProductName, ProductType, NumberOfItems, and PricePerItem. The Output Fields panel contains one output group with the following fields: OrderNumber (string) and ProductDetails (array). The ProductDetals array contains the following fields: ProductName, ProductType, NumberOfItems, and PricePerItem.
In the Output Fields panel, set the data source for the Output group to Input, and configure the group by field as Input.OrderNumber to remove duplicate records from the output. Set the data source for the ProductDetails array to Input.
To ensure that the details in the ProductDetails array correspond to the order number in the output, configure the following filter condition for the array:
:fld.{Input.OrderNumber}= :fld.{Output.OrderNumber}
To further refine the records, use an AND condition in the filter. For example, to exclude records in which the product type is "Candy," configure the following filter condition:
:fld.{Input.OrderNumber}= :fld.{Output.OrderNumber} AND :fld.{Input.ProductType} != 'Candy'
The output contains one record for each order, and incoming records with the product type "Candy" are excluded.
The output data contains the following records:
{
"OrderNumber":"12345",
"ProductDetails":[
{
"ProductName":"AHA Sparkling Water Blueberry Pomegranate - 8-12 Fl. Oz.",
"ProductType":"Beverages",
"NumberOfItems":"1",
"PricePerItem":"3.33"
},
{
"ProductName":"Stella Parm Shredded Cup - 20 Oz",
"ProductType":"Dairy",
"NumberOfItems":"1",
"PricePerItem":"10.99"
}
]
}
{
"OrderNumber":"23456",
"ProductDetails":[
{
"ProductName":"Egglands Best Eggs Cage Free Large Brown - 12 Count",
"ProductType":"Eggs",
"NumberOfItems":"1",
"PricePerItem":"4.99"
},
{
"ProductName":"Producers Milk Lowfat 1% - Half Gallon",
"ProductType":"Dairy",
"NumberOfItems":"1",
"PricePerItem":"2.79"
},
{
"ProductName":"Weetabix Biscuit Cereal Whole Grain 2 Count - 14 Oz",
"ProductType":"Breakfast & Cereal",
"NumberOfItems":"2",
"PricePerItem":"4.99"
}
]
}

Configure group by fields

For hierarchical output, you can group primitive incoming fields for aggregation to produce one row for each group of input data.
To group fields for aggregation:
    1Click the Group By fields icon for the output group or array.
    2Examine if there is already a grouping present. If it requires any changes, continue with these steps, otherwise you can exit the dialog box.
    3Click the Add New Group by Field icon to add a new field for grouping.
    4Clip the ellipsis icon to select the primitive field to add to the grouping.
    5Validate the configuration.
    6Click Save.

Configure order by fields

When the output data format is hierarchical, you can define how the output is sorted.
Note: The following conditions must be true for the sort operation to take effect:
Configure order by fields on the Hierarchy Processor tab.
    1Click the Order By fields icon for the output group or the array or struct field.
    2Add the incoming fields to order by and sort the data in ascending or descending order.
    3Rearrange the fields to adjust the sort order.
    4Validate the configuration.
    5Click Save.

Expression configuration

You can define expressions in the Hierarchy Processor transformation to create customized relational or hierarchical output, but not for flattened output. You also use expressions to define filter conditions.
The Hierarchy Processor transformation can process information from different data sets. Some of the field names might not be unique among the different data sets. As a result, you can't simply reference the field by its name, because the same field name might be used in a different data set, or within the hierarchy of the same data set.
The syntax of the expressions in the Hierarchy Processor transformation differs from that used in the Expression transformation.
To reference a field in a Hierarchy Processor transformation, use the following syntax:
:fld.{input_group_name.field_name}.field_name
The following table describes the syntax in more detail:
Syntax part
Description
.fld.
Denotes the Hierarchy Processor transformation expression syntax.
input_group_name
Name of the input group or dataset.
field_name
Name of the field, including the full path name if it's not a top-level field.
If any field is of the type array, include the array name. If an array is primitive and has no array name, use elem as the array name.
For fields within a struct or an array, the actual field name is specified outside of the closing brace.
.field_name
Include the field_name portion only when referencing a field within a struct or an array. Follow these guidelines:
  • - For fields within a struct, the field_name portion uses the format: .structName.fieldname
  • - For fields within an array, the field_name portion uses the format: .fieldName

Configure expressions

You can configure expressions when the output format is relational or hierarchical. For flattened output, can't configure the default expression once you add the field.
To configure an expression in the Hierarchy Processor transformation, follow these steps:
    1Perform one of the following actions to create the output fields:
    1. aTo use an incoming field as an output, click the Add link that appears when you hover over the incoming field.
    2. bTo create a new field, click New Field in the Output Fields panel near the output group name or near an array name.
    2Click the expression in the Output Fields section.
    For newly created fields, the link appears as Configure. The Field Expression dialog box appears.
    3Enter the expression in the expression editor. You can add incoming fields, functions, and variables to the expression by clicking the Add link next to the object that you want to use. You can also type in the expression manually.
    The expression can contain constants, variables, built-in functions, and user-defined functions. You can nest functions to create a complex expression.
    Note: If you modify the references to any of the fields, your expression will fail.
    4Click Validate.
    5Correct any errors.
    6Click OK.
    You can close the editor and troubleshoot an invalid expression at another time.

Running a mapping with JSON data

To run a mapping that contains a Hierarchy Processor transformation with JSON-formatted data, you need to use a mapping task.

Reading JSON input

When you read JSON data, the input files can be based on a schema with multiple lines or on a schema with a single line.
The following sample shows a JSON schema on a single line:
{"Name":"Tom","Street":"2100 Seaport Blvd","City":"Redwood City","State":"CA","Country":"USA","Zip":"94063"}
The following sample shows a JSON schema that spans across multiple lines:
{
"Name": "Tom",
"Surname": "Day",
"City": "Redwood City",
"State": "CA",
"Country": "USA",
"Zip": "94063"
}
By default, the Hierarchy Processor transformation reads each JSON schema as a single line. To read input that spans across multiple lines, you can configure the formatting options in the Source transformation to read multiple-line JSON files.

Writing JSON output

When you write JSON data, you can write each output record to a separate file, or you can write all output records to one file.
By default, each output record is written to a separate file. To write the output records to one JSON-formatted file, set the following Spark session property in the mapping task:
Session Property Name
Session Property Value
spark.sql.shuffle.partitions
1

Relational to hierarchical example

You need to create a purchase order file in hierarchical format using customer sales data from two purchase order tables and the customer address table.
Use the Hierarchy Processor transformation to create purchase orders in hierarchical format.
The POHeader table contains basic information about the orders placed by customers:
OrderNumber
Comment
OrderDate
ConfirmDate
1
AppD for POD4
2020-10-01 00:00:00.0
2020-10-02 00:00:00.0
2
GoJS for IICS
2020-10-12 00:00:00.0
2020-10-12 00:00:00.0
The Address table contains customer address information for each order:
OrderNumber
AddressType
Name
Street
City
State
Country
Zip
1
ShipTo
Tom
2100 Seaport Blvd
Redwood City
CA
USA
94063
1
BillTo
Tom
2100 Seaport Blvd
Redwood City
CA
USA
94063
2
ShipTo
Bill
1630 S Delaware St
San Mateo
CA
USA
94402
2
BillTo
Bill
PO Box 313
San Mateo
CA
USA
94402
The PODetail table contains details about the customer purchase orders:
OrderNumber
ItemNum
ProductName
Quantity
Price
Comment
ShipDate
PartNum
1
1
AppD Agent for JVM
60
500
JVM agents
2020-10-15 00:00:00.0
1
1
3
ELB agents
10
200
ELB agents
2020-10-15 00:00:00.0
3
1
2
MySQL agents
2
120
MySQL agents
2020-10-16 00:00:00.0
2
1
4
MySQL agents
2
120
MySQL agents
2020-10-01 00:00:00.0
2
1
5
MySQL agents
2
120
MySQL agents
2020-10-01 00:00:00.0
2
2
1
GOJS OEM Edition
2
20000
GOJS Dev
2020-10-19 00:00:00.0
101
2
2
GOJS Professional Service
5
5000
GOJS Dev
2020-10-19 00:00:00.0
102
Perform the following steps to create purchase orders in hierarchical format:
  1. 1Step 1. Design the mapping.
  2. 2Step 2. Build the output group and create a struct.
  3. 3Step 3. Create an array of structs.
  4. 4Step 4. Aggregate the output data.
  5. 5Step 5. Create an array of structs and join data sources.
  6. 6Step 6. Run the mapping.

Step 1. Design the mapping

The first step is to configure the Hierarchy Processor in the Mapping Designer with the Source and Target transformations.
Perform the following steps in the Mapping Designer:
  1. 1Add a Hierarchy Processor transformation and change the output data format to Hierarchical.
  2. 2Add the POHeader, PODetail, and Address tables as source objects.
  3. 3Connect the source objects to the Hierarchy Processor transformation in the data flow.
  4. 4In the Hierarchy Processor transformation, add the PurchaseOrder output group and connect the target object in the data flow.
Your mapping should look like the following image:
The mapping contains three Source transformations that are connected to a Hierarchy Processor transformation that is connected to one Target transformation.

Step 2. Build the output group and create a struct

Once you configure the basic mapping, create the output group next.
Perform the following steps to create the output group with the basic purchase order data and add the ship-to address:
  1. 1Add all the incoming fields from POHeader to the PurchaseOrder output group.
  2. 2Add a new output field with the following properties:
  3. Property
    Value
    Child Of
    PurchaseOrder
    Name
    shipToAddress
    Type
    struct
    Struct Name
    address_struct
  4. 3Add all the incoming fields from Address to the shipToAddress struct in the output group:
  5. The list of incoming fields on the left includes the Address table. The output group on the right contains the empty shipToAddress struct.
  6. 4Delete the following fields that you do not need in the output group:
  7. 5Add a filter condition for the PurchaseOrder.shipToAddress struct:
  8. :fld.{Address.OrderNumber}=:fld.{PurchaseOrder.OrderNumber} AND :fld.{Address.AddressType}='ShipTo'.

Step 3. Create an array of structs

Continue to configure the output by adding the purchase order details in the items array of structs. Configure the data processing strategies to sort by item number, group by part number, and aggregate the incoming quantity and price.
Perform the following steps
  1. 1Add a new output field with the following properties:
  2. Property
    Value
    Child Of
    PurchaseOrder
    Name
    Items_arr
    Type
    array
    Array Element Type
    struct
    Element Struct Name
    item_str
  3. 2Add all the incoming fields from PODetail to the Items_arr array in the output group.
  4. 3Delete the following field that you do not need in the output group: PurchaseOrder.Items_arr.OrderNumber.
  5. 4Add a filter condition for the PurchaseOrder.Items_arr array: :fld.{PODetail.OrderNumber}=:fld.{PurchaseOrder.OrderNumber}.
  6. 5Configure a group by field for the PurchaseOrder.Items_arr array: PODetail.PartNum.
  7. 6Configure an order by field in ascending order for the PurchaseOrder.Items_arr array: PODetail.ItemNum.
  8. 7Update the field expression for PODetail.Quantity in the PurchaseOrder.Items_arr array: SUM(:fld.{PODetail.Quantity}) to aggregate quantity.
  9. 8Update the field expression for PODetail.Price in the PurchaseOrder.Items_arr array: SUM(:fld.{PODetail.Price}) to aggregate price.
The following image shows the data configuration icons and expressions for the Items_arr array in the output group.
The Items_arr output group on the right displays active icons for data sources, filter, group by, and order by. The Quantity and Price fields show the aggregate expressions.

Step 4. Aggregate the output data

You aggregate the output data to calculate the total price.
Perform the following steps to aggregate all the items in a particular purchase order:
  1. 1Add a new output field with the following properties:
  2. Property
    Value
    Child Of
    PurchaseOrder
    Name
    TotalPrice
    Type
    bigint
    Aggregate Options: This field will aggregate values in an output field array
    Enabled
    Output Field
    Items_arr
    The following image shows the aggregate options for the TotalPrice output field:
    Aggregate Options indicates that the Items_arr array output values will be used to aggregate into TotalPrice.
  3. 2Configure the following field expression for PurchaseOrder.TotalPrice to aggregate the total price:
  4. SUM(:fld.{PurchaseOrder.Items_arr.item_str.Quantity}*:fld.{PurchaseOrder.Items_arr.item_str.Price})

Step 5. Create an array of structs and join data sources

Add and configure the same-day items array of structs. Using a filter, a join, and a field expression, you output only the items that were ordered and shipped on the same date.
Perform the following steps:
  1. 1Add a new output field with the following properties:
  2. Property
    Value
    Child Of
    PurchaseOrder
    Name
    SameDayItems
    Type
    array
    Array Element Type
    struct
    Element Struct Name
    sameday_str
  3. 2Add all the incoming fields from PODetail to the SameDayItems array in the output group.
  4. 3Delete the following field that you do not need in the output group: PurchaseOrder.SameDayItems.OrderNumber.
  5. 4Add POHeader as a data source for PurchaseOrder.SameDayItems array.
  6. 5Add a join condition for the PurchaseOrder.SameDayItems array with the following properties:
  7. Property
    Value
    Left Group
    POHeader
    Join Type
    Inner
    Right Group
    PODetail
    Join Condition
    :fld.{POHeader.OrderDate}=:fld.{PODetail.ShipDate} AND :fld.{POHeader.OrderNumber}=:fld.{PODetail.OrderNumber}
    The following image shows the data sources and join condition for SameDayItems:
    The Data Configuration dialog shows the selected data sources: POHeader and PODetail. The join condition shows Left Group: POHeader, Join Type: Inner, Right Group: PODetail.
  8. 6Add a filter condition for the PurchaseOrder.SameDayItems array:
  9. :fld.{PODetail.OrderNumber}=:fld.{PurchaseOrder.OrderNumber}
  10. .

Step 6. Run the mapping

The final step is to create and run the mapping task to produce the JSON output.
Perform the following steps:
  1. 1Create a mapping task.
  2. 2Run the mapping task.
  3. 3Review your output.
Tip: For more information about mapping tasks, see the Tasks guide.
The following JSON shows the PurchaseOrder target output after you run the mapping:
{
"OrderNumber": "1",
"Comment": "AppD for POD4",
"OrderDate": "2018-10-01 00:00:00.0",
"ConfirmDate": "2018-10-02 00:00:00.0",
"address_struct": {
"Name": "Tom",
"Street": "2100 Seaport blvd",
"City": "Redwood City",
"State": "CA",
"Country": "USA",
"Zip": "94063"
},
"Items_arr": [{
"itemNum": "1",
"ProductName": "AppD Agent for JVM",
"Quantity": 60,
"price": 500,
"comment": "JVM agents",
"shipDate": "2018-10-15 00:00:00.0",
"PartNum": "1"
}, {
"itemNum": "2",
"ProductName": "MySQL agents",
"Quantity": 6,
"price": 360,
"comment": "MySQL agents",
"shipDate": "2018-10-15 00:00:00.0",
"PartNum": "2"
}, {
"itemNum": "3",
"ProductName": "ELB agents",
"Quantity": 10,
"price": 200,
"comment": "ELB agents",
"shipDate": "2018-10-16 00:00:00.0",
"PartNum": "3"
}],
"TotalPrice": 34160
} {
"OrderNumber": "2",
"Comment": "GoJS for IICS",
"OrderDate": "2018-10-12 00:00:00.0",
"ConfirmDate": "2018-10-12 00:00:00.0",
"address_struct": {
"Name": "Bill",
"Street": "23rd Ave",
"City": "San Mateo",
"State": "CA",
"Country": "USA",
"Zip": "94401"
},
"Items_arr": [{
"itemNum": "1",
"ProductName": "GOJS OEM Edition",
"Quantity": 2,
"price": 20000,
"comment": "GOJS Dev",
"shipDate": "2018-10-19 00:00:00.0",
"PartNum": "101"
}, {
"itemNum": "2",
"ProductName": "GOJS Prefessional Service",
"Quantity": 5,
"price": 5000,
"comment": "GOJS Dev",
"shipDate": "2018-10-19 00:00:00.0",
"PartNum": "102"
}],
"TotalPrice": 65000
}

Hierarchical to hierarchical example

You want to create a customer order file in hierarchical format, using an existing file of hierarchical data.
The existing customer order file CompanyOrders contains the names of companies that have placed orders and information about each order, including the price, date, shipping address, and ID numbers of ordered items.
The following image shows the structure of the CompanyOrders file:
The CompanyOrders file contains a CompanyName string and Orders array of structs. The Orders struct contains OrderPrice, OrderDate, Street, CIty, State, Country, ZipCode, and Items array of structs. The Items struct contains ItemId, ItemName, ItemPrice, and ItemQuantity.
You want to restructure the shipping address into a struct and add a field to calculate the total price of all orders for each company.
Perform the following steps to create and configure the target file:
  1. 1Step 1. Design the mapping.
  2. 2Step 2. Configure the output group.
  3. 3Step 3. Create an output field to aggregate the total price.
  4. 4Step 4. Create an output struct for the order address.
  5. 5Step 5. Run the mapping.

Step 1. Design the mapping

The first step is to configure the Hierarchy Processor in the Mapping Designer with the Source and Target transformations.
Perform the following steps in the Mapping Designer:
  1. 1Add the CompanyOrders file as a source object.
  2. 2Add a Hierarchy Processor transformation to the mapping and connect CompanyOrders as an input source.
  3. 3In the Hierarchy Processor transformation and change the output data format to Hierarchical. This creates an output group.
  4. 4Add a Target transformation to the mapping, and connect the Hierarchy Processor transformation output to this target object.
The following image shows how the mapping should look like:
A mapping in the Mapping Designer shows the CompanyOrders source connected to the Hierarchy Processor transformation input. The Hierarchy processor transformation is connected to the target transformation.

Step 2. Configure the output group

Once you configure the basic mapping, create the output group by adding all the incoming fields.
Perform the following steps in the Hierarchy Processor transformation:
  1. 1Add all the incoming fields from the input to the output group. Set Add to Preserve incoming field.
  2. The following image shows the Add Field dialog:
    The "Add Field" dialog box shows the incoming field set to Input, Add set to Preserve incoming field, and Output Group or Field set to Output.
  3. 2Verify that the data source for the output group is set to Input.

Step 3. Create an output field to aggregate the total price

Create an output field in the Hierarchy Processor transformation that calculates the total price of all orders for each company.
Perform the following steps:
  1. 1Add a new output field with the following properties:
  2. Property
    Value
    Child Of
    Output
    Name
    TotalOrdersPrice
    Type
    double
    Aggregate Options
    Enabled
    Output Field
    Orders
    The following image shows the aggregate options for the TotalOrdersPrice output field:
    The "New output field" dialog box shows the properties for the TotalOrdersPrice field. The Aggregate Options are highlighted where the "use this field to aggregate values in an output field array" checkbox is selected and the Output Field is set to Orders.
  3. 2Configure the following field expression for Output.TotalOrdersPrice to aggregate the total price of all orders for a company:
  4. SUM(:fld.{Output.Orders.Orders.OrderPrice})

Step 4. Create an output struct for the order address

Create a structure for the order address in the output and remove fields from the output that you do not need.
Perform the following steps:
  1. 1Add a new output field with the following properties:
  2. Property
    Value
    Child Of
    Output.Orders.Orders
    Name
    OrderAddress
    Type
    struct
    Struct Name
    Address
  3. 2Add all the incoming fields from Orders to OrderAddress. Set Add to Add primitive single occurring children.
  4. 3Set the data source for OrderAddress to Use Output.
  5. 4Delete the following fields from the OrderAddress struct that you do not need:
  6. 5Delete the following fields from the Orders output group that you do not need:
The following image shows the OrderAddress struct: The Hierarchy Processor transformation output fields shows the final hierarchical structure of the output. The OrderAddress struct is highlighted, where OrderAddress is a struct that contains the string values Street, City, State, Country, and ZipCode.

Step 5. Run the mapping

The final step is to create and run the mapping task to produce the JSON output.
Perform the following steps:
  1. 1Create a mapping task.
  2. 2Run the mapping task.
  3. 3Review your output.
Tip: For more information about mapping tasks, see the Tasks guide.