The Hierarchy Processor transformation can process the following types of hierarchical output:
•Relational to hierarchical. Convert up to five relational input groups to one hierarchical output group.
•Hierarchical to hierarchical. Convert one or more hierarchical input groups to one hierarchical output group with a different schema.
Defining hierarchical output with the Hierarchy Processor transformation
You can map the incoming fields to output fields to configure the output data structure for hierarchical output.
When the output data format is hierarchical, consider the following guidelines:
•There is a specific order that you should follow when performing operations.
•You can rename or delete output groups.
•You can aggregate both the input and output data.
The following image shows the Hierarchy Processor tab with relational input and hierarchical output:
1Output format. Select Hierarchical to construct a hierarchical schema for incoming fields.
2Input groups, incoming fields. Use these fields to map to the output fields.
3Output fields. Use these fields to create the complex output file.
4Add incoming field to output group. Use to add fields to the output group.
5Output field names. Click on a field name to modify the field name or data type.
6Data Configuration icons. Use to configure the output groups and fields.
7Expression. Click on an expression to view or customize the output field expression.
8Add or delete output field. Use to create or delete output fields.
Tip: You can resize the Incoming Fields or Output Fields panels to better see the information.
Define hierarchical output
To define a Hierarchy Processor transformation with hierarchical output, perform the following tasks:
1Select the Hierarchical output format.
2 Configure the transformation output by adding incoming fields to the output or by manually adding fields to the output.
3Configure the output groups and fields.
aConfigure data sources.
bOptionally, join data sources.
cOptionally, customize the output field expressions.
dOptionally, add filters to the transformation.
eOptionally, define output order using order by fields.
fOptional, aggregate input data using group by fields.
gOptionally, aggregate output data.
Order of operations
You should perform operations in a certain order when working with hierarchical output.
When the output data format is hierarchical, perform the operations in the following order:
1Join. Specify join conditions in the data sources.
2Filter. Optionally, specify filter conditions to include only a subset of the source data.
3Group By. Optionally, specify the incoming fields for aggregate expressions.
4Order By. Optionally, specify the incoming fields to create sorted output.
Renaming and deleting output groups
When the output data format is hierarchical, you can rename or delete input groups.
When you modify the input group, the following configurations are also updated if they refer to the input group:
•Input data source names
•Join data source names
•Order by fields
•Group by fields
However, you must modify the following configurations manually:
•Expressions
•Join conditions
•Filter conditions
Note: When the output data format is hierarchical, you cannot change the input group name.
Adding incoming fields to hierarchical output groups
You can add individual incoming fields or all fields in an input group to the output group, You can add incoming fields to an output group or to an array or struct field. You can rename or delete output fields as needed.
Click the Add link on the Hierarchy Processor tab next to the incoming field or input group that you want to add.
You can add fields in the following ways based on the incoming field's data type and the output type:
Add incoming field
Adds the selected incoming field to the selected output group or field.
The "Add incoming field" option is available when you add incoming fields with primitive data types to the output.
Add input group
Adds all incoming fields in the group to the selected output group or field.
The "Add input group" option is available when you add incoming fields with primitive data types to the output.
Add primitive single occurring children
Adds all primitive single occurring children under the field to the output, including the primitive, single occurring child fields that are nested under a struct. Adding primitive single occurring children doesn't add children that are nested under arrays.
You can add primitive single occurring children when you select an incoming struct or array field, and the output is hierarchical. For more information, see Add primitive single occurring children.
Preserve incoming field
Preserves the hierarchical structure of the selected field in the output. For example, if you add an array of structs to the output group, then the output group contains an array of structs with the same structure.
The "Preserve incoming field" option is available when you select an incoming struct or array field, and the output is hierarchical. For more information, see Preserve incoming fields.
Flatten selected array
Flattens an array of primitives into a primitive field. Creates one output record for each element in the array.
The "Flatten selected array" option is available when you select an incoming struct or array field, and the output is hierarchical. For more information, see Flatten selected array.
Add selected array as struct
Flattens an array of structs into a struct field. Creates one output record for each element the array.
The "Add selected array as struct" option is available when you select an incoming struct or array field, and the output is hierarchical. For more information, see Add selected array as struct.
Add primitive single occurring children
You can add primitive single occurring children when you are adding incoming struct and array fields to the output. The "Add primitive single occurring children" option is available when you add incoming struct or array fields to hierarchical output. It adds all primitive single occurring children under the field to the output group, including the primitive, single occurring child fields that are nested under a struct.
You can add primitive single occurring children only for incoming fields with child objects.
If the field you select contains an array, the array and its children are not added because an array can contain multiple elements.
Example
You want to extract the make, model, company, and policy number from an array of vehicle records. Each output record should contain information about one vehicle.
Add the vehicle array to the output group and choose Add Primitive Single Occurring Children.
The following image shows the incoming and output fields:
Note that the date field is not added to the output group because it is under a child array of the selected field and is not single occurring.
Preserve incoming fields
When you add incoming struct and array fields to the output, you can preserve the incoming field. When you preserve the incoming field, it's copied to hierarchical output without changing the structure.
When you add a nested array and select "Preserve incoming fields", the data source configuration for the output group determines how the records are created.
Example of preserving incoming fields
You want to extract the description information from a nested array of maintenance records. The description information is in an array of strings. You want the output data to also be in an array of strings.
Add the description array to the output group and choose Preserve Incoming Field.
The following image shows the incoming and output fields:
When Data Integration creates the output array, it sets the data source for the description array to Input.vehicle.vehicle.maintenance.maintenance.desc.elem, indicating that the information for the output array comes from the elements in the desc array in the Input group. The data source for the Output group determines the structure of the output records.
By default, Data Integration sets the data source for the Output group to Input. When you run the mapping, Data Integration collates all descriptions into one output record. To write the descriptions to separate records for each vehicle, set the data source to Input.vehicle.vehicle. To write the descriptions to separate records for each maintenance record, set the data source to Input.vehicle.vehicle.maintenance.maintenance. For more information about data source configuration, see Configuring data sources.
Flatten selected array
When you add an incoming array of primitives to the output, you can flatten the selected array into a field of the same data type.
When you flatten a selected array, it creates one record for each element in the array.
Example of flattening an array
You want to extract the description information from a nested array of maintenance records. The description information is in an array of strings. You want to flatten the output into a string field.
Add the description array to the output group and choose Flatten Selected Array.
The following image shows the incoming and output fields:
The output contains one record for each occurrence of description in the incoming data.
For example, the incoming data contains the following record:
Uou can create and modify the output group and output fields for hierarchical output.
After you add incoming fields to an output group, you can select an output field name to modify the fields. You can also add and define output fields manually.
Name and Type
The following table describes the properties for hierarchical output fields:
Property
Description
Child Of
The parent field or group that this field belongs to. The name structure describes the group, parent fields, and struct name. For example: OUTGROUP.Grandparent.Parent.struct_name
Name
The name of the current output group or field.
Type
The data type of the current field. For hierarchical output, you can choose either a primitive or complex data type.
Precision
The total number of significant digits in the field.
Scale
The number of digits to the right of the decimal point.
Array Element Type
The data type of the current array element.
Array Element Precision
The precision for the current array element. Used when creating the target data.
Array Element Scale
The scale for the current array element. Used when creating the target data.
Struct Name
The struct name for the current struct field.
Element Struct Name
The element struct name for the current array of structs field.
Description
Optional description of the field and its usage.
Note: The output field properties that appear depend on the data type.
Aggregate Options
All output fields except for parent fields have aggregation options that you can set. The following table describes the aggregate options:
Property
Description
Use this field to aggregate values in an output field array.
Select this property to aggregate output data into the current field.
Output Field
If you are aggregating, specify the sibling array with fields to aggregate. The array must be a child of the current output field or group.
Aggregating values in an output field array
If the Hierarchy Processor transformation output contains an array, you might want to add an output field to aggregate values in the array. The output field must be a sibling of the array field.
For example, the Output group contains an array with orders information. The Orders array contains the OrderPrice field, which stores the price for each order. You want to find the total order price for each company.
The following image shows the output fields:
To find the total order price, add a field called TotalOrderPrice to the output group.
Edit the TotalOrderPrice field. Select Use this field to aggregate values in an output field array, and select the Orders array as the output field for which you want to aggregate values. Configure the following expression for the TotalOrderPrice field:
SUM(:fld.{Output.Orders.Orders.OrderPrice})
Configuring data sources
You can configure a data source for the output group and for all array and struct fields in the output. A data source identifies the input group or incoming array that populates the primitive child fields for the output group or field.
If you select a field as a data source, you have access to the following objects:
•All its primitive children
•All its ancestors’ primitive children
•Individual elements of its array children
•Individual elements of its ancestors’ array children
You can configure multiple data sources for an output group or field. If you do this, you must configure a join condition to join the data.
You can configure filters to exclude certain records. You can also specify group by fields for aggregating the data and order by fields for sorting records.
The data sources for hierarchical output group and fields can vary based on the output data structure.
Inheriting data sources from the parent
When you configure the data source for an array or struct field, you can populate the children of the field either with the incoming data or by inheriting the parent's data sources.
When you use the incoming data, the incoming data is used to populate the children of the array or struct.
When you inherit the parent's data sources, the data transformed into the parent output field populates the children of the array or struct. This preserves the data transformations (for example, joins and filters), in the parent fields. You can apply filters to the field to further filter the data, but you can't configure data sources, joins, group by fields, or order by fields.
For example, you are reading data from a relational table of customer records in which the customer ID is unique. The incoming data contains the following records:
You want to write the customer address fields to a struct.
The following image shows the incoming and output fields:
In the Output Fields panel, set the data source for the Output group to Input and the data source for the Address struct to Inherit parent's data sources (Output). When you run the mapping, the Hierarchy Processor transformation creates one record for each occurrence of CustID in the input data and populates the struct with the address data that corresponds to the customer ID in the output:
If you set the data source for the Address struct to Input, then you must also configure the following filter condition on the struct to get the same output: :fld.{Input.CustID} = :fld.{Output.CustID} AND :fld.{Input.Name} = :fld.{Output.Name}. For more information about configuring filter conditions, see Configure filter conditions.
When the output field is an array that inherits its parent's data, the Hierarchy Processor transformation creates an array with one element.
Configure data source in hierarchical output example
When you add a nested array to the output and preserve the incoming field, the records that get created vary based on how you configure the data source for the output group.
For example, you want to extract the description information from a nested array of maintenance records in a JSON file. The description information is in an array of strings. You want the output data to also be in an array of strings.
The following sample data shows an incoming record:
In the Hierarchy Processor transformation, you add the description array to the output group and choose Preserve Incoming Field.
The following image shows the incoming and output fields:
When the Hierarchy Processor transformation creates the output array, it sets the data source for the description array to Input.vehicle.vehicle.maintenance.maintenance.desc.elem, indicating that the information for the output array comes from the elements in the desc array in the Input group. By default, the Hierarchy Processor transformation sets the data source for the Output group to Input.
The data source configuration for the output group determines how the Hierarchy Processor transformation creates the output records.
To combine all descriptions into one output record, keep the data source for the output group as Input. This produces the following output:
To create one output record for each occurrence of vehicle in the incoming data, set the data source to Input.vehicle.vehicle. In this case, the output data contains one record for each vehicle:
To create one output record for each occurrence of maintenance in the incoming data, set the data source to Input.vehicle.vehicle.maintenance.maintenance. In this case, the output contains one record for each maintenance record:
If you convert hierarchical data to hierarchical data and you add multi-level arrays to the output, you must ensure that the data sources for the output group or fields do not conflict. A conflict occurs when you select both an incoming array and one of its descendant arrays to be data sources for the same output group or field.
If you have a data source conflict, the transformation remains invalid until you resolve the conflict. Additionally, you cannot configure joins, filters, order by fields, or group by fields until you resolve the conflict.
Example of a data source conflict
A data source conflict occurs if you select both Array1 and Array2 as data sources for the output group in the following image:
The conflict occurs because there is no way to determine which data source provides the data for Field1 and Field2. In this case, the Hierarchy Processor transformation displays a conflicting data sources error.
To resolve the conflict, remove one of the data sources from the Output group.
Modify data sources for output group or fields
The output group and all array and struct fields in the output require a data source. When you add an input group or incoming field to the output, the Hierarchy Processor transformation usually sets a data source for you. You can manually add or edit a data source.
To configure a data source for the output group:
1Click the Data Sources icon for the output group.
2Examine if there is already a data source configured for the output group. If it requires any changes, continue with these steps, otherwise you can exit the dialog box.
3Click the Add Data Source icon to add a new data source.
4Click the ellipsis icon to select the input group or array to use as a data source.
5Validate the configuration.
6Click Save.
Tip: Click the trash icon to remove any incorrect or unneeded data source.
To configure a data source for struct or array fields:
1Click the Data Sources icon for the output group.
2Examine if there is already a data source configured for the struct or array. If it requires any changes, continue with these steps, otherwise you can exit the dialog box.
3Select the type of data source to use:
- Inherit parent's data sources: Use the same data sources as the parent for the struct or array.
- Use input group or incoming fields: Select an input group or incoming field to use as a data source.
4Validate the configuration.
5Click Save.
Tip: Click the trash icon to remove any incorrect or unneeded data source.
Configuring output data
For hierarchical output, you can configure data source join and filter conditions as well as group by and order by fields. You can aggregate on both the input and output data.
Configure join conditions
When the output data format is hierarchical, you can define join conditions for the data sources. You must configure a join condition if an output group or field has multiple data sources. Configure a join condition to join the data from the input groups or incoming fields.
Configure the join conditions for the output groups on the Hierarchy Processor tab.
1Click the Join Conditions icon for the output group.
2Add a join condition.
3Select the left data source.
4Select the join type:
- Inner. Includes rows with matching join conditions. Discards rows that do not match the join conditions.
- Left Outer. Includes all rows from the right pipeline and the matching rows from the left pipeline. Discards the unmatched rows from the left pipeline.
- Right Outer. Includes all rows from the left pipeline and the matching rows from the right pipeline. Discards the unmatched rows from the right pipeline.
- Full Outer. Includes rows with matching join conditions and all incoming data from the left pipeline and right pipeline.
Note: If you select an outer join on a large data set, you might need to increase the Spark driver memory in the mapping task. For more information about Spark session properties, see Tasks.
5Select the right data source.
6Click Configure Join Condition.
7Select fields and built-in functions to create the expression.
8Validate the expression.
9Click Save.
Configure filter conditions
You can define filter conditions to project a subset of the input data in the Hierarchy Processor transformation. You can filter based on incoming fields or output fields.
You can configure a filter condition to read data from primitive fields into an output array or struct field when the data in the array or struct field must correspond to the data in a sibling field in the output group.
Filter configuration example
You want to convert relational data to a JSON file. The incoming data is in a relational table that contains orders information. The orders table contains multiple rows for each order because each order can contain several products.
The incoming data looks like the following data:
OrderNumber,ProductName,ProductType,NumberOfItems,PricePerItem 12345,M&Ms Candies Chocolate Peanut Party Size - 38 Oz,Candy,2,14.49 12345,Stella Parm Shredded Cup - 20 Oz,Dairy,1,10.99 12345,AHA Sparkling Water Blueberry Pomegranate - 8-12 Fl. Oz.,Beverages,1,3.33 23456,Weetabix Biscuit Cereal Whole Grain 2 Count - 14 Oz,Breakfast & Cereal,2,4.99 23456,Producers Milk Lowfat 1% - Half Gallon,Dairy,1,2.79 23456,Egglands Best Eggs Cage Free Large Brown - 12 Count,Eggs,1,4.99
You want to read the product details into an array, where the product details are associated with a particular order number.
The following image shows the structure of the incoming and output fields:
In the Output Fields panel, set the data source for the Output group to Input, and configure the group by field as Input.OrderNumber to remove duplicate records from the output. Set the data source for the ProductDetails array to Input.
To ensure that the details in the ProductDetails array correspond to the order number in the output, configure the following filter condition for the array:
To further refine the records, use an AND condition in the filter. For example, to exclude records in which the product type is "Candy," configure the following filter condition:
:fld.{Input.OrderNumber}= :fld.{Output.OrderNumber} AND :fld.{Input.ProductType} != 'Candy'
The output contains one record for each order, and incoming records with the product type "Candy" are excluded.
For hierarchical output, you can group primitive incoming fields for aggregation to produce one row for each group of input data.
To group fields for aggregation:
1Click the Group By fields icon for the output group or array.
2Examine if there is already a grouping present. If it requires any changes, continue with these steps, otherwise you can exit the dialog box.
3Click the Add New Group by Field icon to add a new field for grouping.
4Clip the ellipsis icon to select the primitive field to add to the grouping.
5Validate the configuration.
6Click Save.
Configure order by fields
When the output data format is hierarchical, you can define how the output is sorted.
Note: The following conditions must be true for the sort operation to take effect:
•The Hierarchy Processor transformation is connected directly to the Target transformation.
•All sort fields are connected to the Target transformation.
•The data types in the connected order by fields match the data types in the target fields.
Configure order by fields on the Hierarchy Processor tab.
1Click the Order By fields icon for the output group or the array or struct field.
2Add the incoming fields to order by and sort the data in ascending or descending order.
3Rearrange the fields to adjust the sort order.
4Validate the configuration.
5Click Save.
Expression configuration
You can define expressions in the Hierarchy Processor transformation to create customized relational or hierarchical output, but not for flattened output. You also use expressions to define filter conditions.
The Hierarchy Processor transformation can process information from different data sets. Some of the field names might not be unique among the different data sets. As a result, you can't simply reference the field by its name, because the same field name might be used in a different data set, or within the hierarchy of the same data set.
The syntax of the expressions in the Hierarchy Processor transformation differs from that used in the Expression transformation.
To reference a field in a Hierarchy Processor transformation, use the following syntax:
:fld.{input_group_name.field_name}.field_name
The following table describes the syntax in more detail:
Syntax part
Description
.fld.
Denotes the Hierarchy Processor transformation expression syntax.
input_group_name
Name of the input group or dataset.
field_name
Name of the field, including the full path name if it's not a top-level field.
If any field is of the type array, include the array name. If an array is primitive and has no array name, use elem as the array name.
For fields within a struct or an array, the actual field name is specified outside of the closing brace.
.field_name
Include the field_name portion only when referencing a field within a struct or an array. Follow these guidelines:
- For fields within a struct, the field_name portion uses the format: .structName.fieldname
- For fields within an array, the field_name portion uses the format: .fieldName
Configure expressions
You can configure expressions when the output format is relational or hierarchical. For flattened output, can't configure the default expression once you add the field.
To configure an expression in the Hierarchy Processor transformation, follow these steps:
1Perform one of the following actions to create the output fields:
aTo use an incoming field as an output, click the Add link that appears when you hover over the incoming field.
bTo create a new field, click New Field in the Output Fields panel near the output group name or near an array name.
2Click the expression in the Output Fields section.
For newly created fields, the link appears as Configure. The Field Expression dialog box appears.
3Enter the expression in the expression editor. You can add incoming fields, functions, and variables to the expression by clicking the Add link next to the object that you want to use. You can also type in the expression manually.
The expression can contain constants, variables, built-in functions, and user-defined functions. You can nest functions to create a complex expression.
Note: If you modify the references to any of the fields, your expression will fail.
4Click Validate.
5Correct any errors.
6Click OK.
You can close the editor and troubleshoot an invalid expression at another time.
Running a mapping with JSON data
To run a mapping that contains a Hierarchy Processor transformation with JSON-formatted data, you need to use a mapping task.
Reading JSON input
When you read JSON data, the input files can be based on a schema with multiple lines or on a schema with a single line.
The following sample shows a JSON schema on a single line:
By default, the Hierarchy Processor transformation reads each JSON schema as a single line. To read input that spans across multiple lines, you can configure the formatting options in the Source transformation to read multiple-line JSON files.
Writing JSON output
When you write JSON data, you can write each output record to a separate file, or you can write all output records to one file.
By default, each output record is written to a separate file. To write the output records to one JSON-formatted file, set the following Spark session property in the mapping task:
Session Property Name
Session Property Value
spark.sql.shuffle.partitions
1
Relational to hierarchical example
You need to create a purchase order file in hierarchical format using customer sales data from two purchase order tables and the customer address table.
Use the Hierarchy Processor transformation to create purchase orders in hierarchical format.
The POHeader table contains basic information about the orders placed by customers:
OrderNumber
Comment
OrderDate
ConfirmDate
1
AppD for POD4
2020-10-01 00:00:00.0
2020-10-02 00:00:00.0
2
GoJS for IICS
2020-10-12 00:00:00.0
2020-10-12 00:00:00.0
The Address table contains customer address information for each order:
OrderNumber
AddressType
Name
Street
City
State
Country
Zip
1
ShipTo
Tom
2100 Seaport Blvd
Redwood City
CA
USA
94063
1
BillTo
Tom
2100 Seaport Blvd
Redwood City
CA
USA
94063
2
ShipTo
Bill
1630 S Delaware St
San Mateo
CA
USA
94402
2
BillTo
Bill
PO Box 313
San Mateo
CA
USA
94402
The PODetail table contains details about the customer purchase orders:
OrderNumber
ItemNum
ProductName
Quantity
Price
Comment
ShipDate
PartNum
1
1
AppD Agent for JVM
60
500
JVM agents
2020-10-15 00:00:00.0
1
1
3
ELB agents
10
200
ELB agents
2020-10-15 00:00:00.0
3
1
2
MySQL agents
2
120
MySQL agents
2020-10-16 00:00:00.0
2
1
4
MySQL agents
2
120
MySQL agents
2020-10-01 00:00:00.0
2
1
5
MySQL agents
2
120
MySQL agents
2020-10-01 00:00:00.0
2
2
1
GOJS OEM Edition
2
20000
GOJS Dev
2020-10-19 00:00:00.0
101
2
2
GOJS Professional Service
5
5000
GOJS Dev
2020-10-19 00:00:00.0
102
Perform the following steps to create purchase orders in hierarchical format:
1Step 1. Design the mapping.
2Step 2. Build the output group and create a struct.
3Step 3. Create an array of structs.
4Step 4. Aggregate the output data.
5Step 5. Create an array of structs and join data sources.
6Step 6. Run the mapping.
Step 1. Design the mapping
The first step is to configure the Hierarchy Processor in the Mapping Designer with the Source and Target transformations.
Perform the following steps in the Mapping Designer:
1Add a Hierarchy Processor transformation and change the output data format to Hierarchical.
2Add the POHeader, PODetail, and Address tables as source objects.
3Connect the source objects to the Hierarchy Processor transformation in the data flow.
4In the Hierarchy Processor transformation, add the PurchaseOrder output group and connect the target object in the data flow.
Your mapping should look like the following image:
Step 2. Build the output group and create a struct
Once you configure the basic mapping, create the output group next.
Perform the following steps to create the output group with the basic purchase order data and add the ship-to address:
1Add all the incoming fields from POHeader to the PurchaseOrder output group.
2Add a new output field with the following properties:
Property
Value
Child Of
PurchaseOrder
Name
shipToAddress
Type
struct
Struct Name
address_struct
3Add all the incoming fields from Address to the shipToAddress struct in the output group:
4Delete the following fields that you do not need in the output group:
- PurchaseOrder.shipToAddress.OrderNumber
- PurchaseOrder.shipToAddress.AddressType
5Add a filter condition for the PurchaseOrder.shipToAddress struct:
:fld.{Address.OrderNumber}=:fld.{PurchaseOrder.OrderNumber} AND :fld.{Address.AddressType}='ShipTo'.
Step 3. Create an array of structs
Continue to configure the output by adding the purchase order details in the items array of structs. Configure the data processing strategies to sort by item number, group by part number, and aggregate the incoming quantity and price.
Perform the following steps
1Add a new output field with the following properties:
Property
Value
Child Of
PurchaseOrder
Name
Items_arr
Type
array
Array Element Type
struct
Element Struct Name
item_str
2Add all the incoming fields from PODetail to the Items_arr array in the output group.
3Delete the following field that you do not need in the output group: PurchaseOrder.Items_arr.OrderNumber.
4Add a filter condition for the PurchaseOrder.Items_arr array: :fld.{PODetail.OrderNumber}=:fld.{PurchaseOrder.OrderNumber}.
5Configure a group by field for the PurchaseOrder.Items_arr array: PODetail.PartNum.
6Configure an order by field in ascending order for the PurchaseOrder.Items_arr array: PODetail.ItemNum.
7Update the field expression for PODetail.Quantity in the PurchaseOrder.Items_arr array: SUM(:fld.{PODetail.Quantity}) to aggregate quantity.
8Update the field expression for PODetail.Price in the PurchaseOrder.Items_arr array: SUM(:fld.{PODetail.Price}) to aggregate price.
The following image shows the data configuration icons and expressions for the Items_arr array in the output group.
Step 4. Aggregate the output data
You aggregate the output data to calculate the total price.
Perform the following steps to aggregate all the items in a particular purchase order:
1Add a new output field with the following properties:
Property
Value
Child Of
PurchaseOrder
Name
TotalPrice
Type
bigint
Aggregate Options: This field will aggregate values in an output field array
Enabled
Output Field
Items_arr
The following image shows the aggregate options for the TotalPrice output field:
2Configure the following field expression for PurchaseOrder.TotalPrice to aggregate the total price:
Step 5. Create an array of structs and join data sources
Add and configure the same-day items array of structs. Using a filter, a join, and a field expression, you output only the items that were ordered and shipped on the same date.
Perform the following steps:
1Add a new output field with the following properties:
Property
Value
Child Of
PurchaseOrder
Name
SameDayItems
Type
array
Array Element Type
struct
Element Struct Name
sameday_str
2Add all the incoming fields from PODetail to the SameDayItems array in the output group.
3Delete the following field that you do not need in the output group: PurchaseOrder.SameDayItems.OrderNumber.
4Add POHeader as a data source for PurchaseOrder.SameDayItems array.
5Add a join condition for the PurchaseOrder.SameDayItems array with the following properties:
Property
Value
Left Group
POHeader
Join Type
Inner
Right Group
PODetail
Join Condition
:fld.{POHeader.OrderDate}=:fld.{PODetail.ShipDate} AND :fld.{POHeader.OrderNumber}=:fld.{PODetail.OrderNumber}
The following image shows the data sources and join condition for SameDayItems:
6Add a filter condition for the PurchaseOrder.SameDayItems array:
You want to create a customer order file in hierarchical format, using an existing file of hierarchical data.
The existing customer order file CompanyOrders contains the names of companies that have placed orders and information about each order, including the price, date, shipping address, and ID numbers of ordered items.
The following image shows the structure of the CompanyOrders file:
You want to restructure the shipping address into a struct and add a field to calculate the total price of all orders for each company.
Perform the following steps to create and configure the target file:
1Step 1. Design the mapping.
2Step 2. Configure the output group.
3Step 3. Create an output field to aggregate the total price.
4Step 4. Create an output struct for the order address.
5Step 5. Run the mapping.
Step 1. Design the mapping
The first step is to configure the Hierarchy Processor in the Mapping Designer with the Source and Target transformations.
Perform the following steps in the Mapping Designer:
1Add the CompanyOrders file as a source object.
2Add a Hierarchy Processor transformation to the mapping and connect CompanyOrders as an input source.
3In the Hierarchy Processor transformation and change the output data format to Hierarchical. This creates an output group.
4Add a Target transformation to the mapping, and connect the Hierarchy Processor transformation output to this target object.
The following image shows how the mapping should look like:
Step 2. Configure the output group
Once you configure the basic mapping, create the output group by adding all the incoming fields.
Perform the following steps in the Hierarchy Processor transformation:
1Add all the incoming fields from the input to the output group. Set Add to Preserve incoming field.
The following image shows the Add Field dialog:
2Verify that the data source for the output group is set to Input.
Step 3. Create an output field to aggregate the total price
Create an output field in the Hierarchy Processor transformation that calculates the total price of all orders for each company.
Perform the following steps:
1Add a new output field with the following properties:
Property
Value
Child Of
Output
Name
TotalOrdersPrice
Type
double
Aggregate Options
Enabled
Output Field
Orders
The following image shows the aggregate options for the TotalOrdersPrice output field:
2Configure the following field expression for Output.TotalOrdersPrice to aggregate the total price of all orders for a company:
SUM(:fld.{Output.Orders.Orders.OrderPrice})
Step 4. Create an output struct for the order address
Create a structure for the order address in the output and remove fields from the output that you do not need.
Perform the following steps:
1Add a new output field with the following properties:
Property
Value
Child Of
Output.Orders.Orders
Name
OrderAddress
Type
struct
Struct Name
Address
2Add all the incoming fields from Orders to OrderAddress. Set Add to Add primitive single occurring children.
3Set the data source for OrderAddress to Use Output.
4Delete the following fields from the OrderAddress struct that you do not need:
- Output.Orders.Orders.OrderAddress.OrderPrice
- Output.Orders.Orders.OrderAddress.OrderDate
5Delete the following fields from the Orders output group that you do not need:
- Output.Orders.Orders.Street
- Output.Orders.Orders.City
- Output.Orders.Orders.State
- Output.Orders.Orders.Country
- Output.Orders.Orders.ZipCode
The following image shows the OrderAddress struct:
Step 5. Run the mapping
The final step is to create and run the mapping task to produce the JSON output.
Perform the following steps:
1Create a mapping task.
2Run the mapping task.
3Review your output.
Tip: For more information about mapping tasks, see the Tasks guide.