Transformations > Hierarchy Processor transformation > Processing flattened output
  

Processing flattened output

The Hierarchy Processor transformation can convert a hierarchical input group to a flattened denormalized output group.
For example, you have a shop maintenance file that contains customer and vehicle information for all your customers. You want to denormalize the vehicle maintenance data and exclude personal information about the customers.

Defining flattened output with the Hierarchy Processor transformation

Use the Hierarchy Processor tab to map incoming fields to output fields. The flattened output option allows you to create denormalized output.
The following image shows the Hierarchy Processor tab with hierarchical input and flattened output:
The Hierarchy Processor tab appears with Output Format Flattened selected. The transformation contains selectable incoming fields in the left panel and an output group and fields in the right panel. Links to output fields provide ways to modify output fields. Field expressions are also visible in the left panel.
  1. 1Output format. Select Flattened to convert hierarchical input into denormalized output.
  2. 2Incoming fields. View the incoming data schema and field types.
  3. 3Output fields. View the output fields as you create the output file.
  4. 4Configure the output. Select incoming fields to build the flattened output file schema.
  5. 5Output field names. Click on a field name to modify the field name.
  6. 6Expression. View the field expressions to determine the input to output field mappings.
  7. 7Delete. Use to delete output fields.
Tip: Use the maximize icon and resize the Incoming Fields panel or Output Fields panel to see the information you need.
To define a Hierarchy Processor transformation with flattened output, perform the following tasks:
    1Select the Flattened output format.
    2 Configure the transformation output by selecting incoming fields to add to the output.
    3Optional. Rename output fields.
    4Optional. Delete output fields.

Adding incoming fields to flattened output groups

You can add incoming fields to the output individually or you can add an entire input group. After you add the incoming fields to the output group, you can rename or delete output fields as needed.
To add output fields, select the check box next to the input field or input group that you want to add. Selecting a parent automatically selects all child items, but you can deselect the entries that you do not want. You can also delete any unneeded entries from the output.

Flatten hierarchical data

When you add incoming hierarchical fields to the output, all fields in the output schema are automatically flattened.

Example of flattened hierarchical data

The incoming fields are contained within a hierarchy of structs and arrays. You just need four fields in the output: OrderNumber, Name, ProductName, and Quantity.
Select the appropriate check boxes to add the incoming fields to the output.
The following image shows the input and output:

Renaming flattened output group and fields

After you add incoming fields to the output group in the flattened format, you can change the names of these fields if necessary. You can also change the name of the output group.
To change the name of any output field, click on the field name. To change the name of the output group, click on the output group name.
The following table describes the properties for the output fields:
Property
Description
Name
The name of the output field.
Type
The data type of the current field. You cannot change the data type.
Precision
The total number of significant digits in the field. You cannot change the precision.
Scale
The number of digits to the right of the decimal point. You cannot change the scale.

Expression format

When you use the flattened output, the expression is fixed to the default and can't be changed. But if you understand the syntax of the default expression, you can easily identify the source location of each output field.
Tip: To add fields to the output group, use the check boxes next to the incoming field names. This differs from the Add link that you use for relational or hierarchical output.
The Hierarchy Processor transformation can process information from different data sets. Some of the field names might not be unique among the different data sets. As a result, you can't simply reference the field by its name, because the same field name might be used in a different data set, or within the hierarchy of the same data set.
The syntax of the expressions in the Hierarchy Processor transformation differs from that used in the Expression transformation.
To reference a field in a Hierarchy Processor transformation, use the following syntax:
:fld.{input_group_name.field_name}.field_name
The following table describes the syntax in more detail:
Syntax part
Description
.fld.
Denotes the Hierarchy Processor transformation expression syntax.
input_group_name
Name of the input group or dataset.
field_name
Name of the field, including the full path name if it's not a top-level field.
If any field is of the type array, include the array name. If an array is primitive and has no array name, use elem as the array name.
For fields within a struct or an array, the actual field name is specified outside of the closing brace.
.field_name
Include the field_name portion only when referencing a field within a struct or an array. Follow these guidelines:
  • - For fields within a struct, the field_name portion uses the format: .structName.fieldname
  • - For fields within an array, the field_name portion uses the format: .fieldName

Running a mapping with JSON data

To run a mapping that contains a Hierarchy Processor transformation with JSON-formatted data, you need to use a mapping task.

Reading JSON input

When you read JSON data, the input files can be based on a schema with multiple lines or on a schema with a single line.
The following sample shows a JSON schema on a single line:
{"Name":"Tom","Street":"2100 Seaport Blvd","City":"Redwood City","State":"CA","Country":"USA","Zip":"94063"}
The following sample shows a JSON schema that spans across multiple lines:
{
"Name": "Tom",
"Surname": "Day",
"City": "Redwood City",
"State": "CA",
"Country": "USA",
"Zip": "94063"
}
By default, the Hierarchy Processor transformation reads each JSON schema as a single line. To read input that spans across multiple lines, you can configure the formatting options in the Source transformation to read multiple-line JSON files.

Writing JSON output

When you write JSON data, you can write each output record to a separate file, or you can write all output records to one file.
By default, each output record is written to a separate file. To write the output records to one JSON-formatted file, set the following Spark session property in the mapping task:
Session Property Name
Session Property Value
spark.sql.shuffle.partitions
1

Hierarchical to flattened example

You want to convert hierarchical data to relational data and write the data to a target file in denormalized format.
A shop maintenance file contains the customer and vehicle information for customers. The file is in hierarchical JSON format and is generated by your company's shop application.
The following JSON script shows the shop maintenance source input before you run the mapping:
{
"people": [{
"personal": {
"age": 20,
"gender": "M",
"name": {
"first": "John",
"last": "Doe"
}
},
"vehicles": [{
"type": "car",
"model": "Honda Civic",
"insurance": {
"policy_num": "HA12345"
},
"maintenance": [{
"desc": "oil change",
"cost": "111.50",
"summary": [{
"line1": "0w20",
"line2": "synthetic"
}, {
"line1": "2.0L 4-cyl",
"line2": "4.4 quarts"
}]
}, {
"desc": "new tires",
"cost": "425.00",
"summary": [{
"line1": "235/40R18",
"line2": "4 tires"
}, {
"line1": "All Season",
"line2": "No spare"
}]
}]
}, {
"type": "truck",
"model": "Dodge Ram",
"insurance": {
"policy_num": "DR12345"
},
"maintenance": [{
"desc": "new tires",
"cost": "299.99",
"summary": [{
"line1": "275/60R20",
"line2": "2 tires"
}, {
"line1": "All Season",
"line2": "No spare"
}]
}, {
"desc": "oil change",
"cost": "111.50",
"summary": [{
"line1": "5w30",
"line2": "conventional"
}, {
"line1": "5.7L V8",
"line2": "7.0 quarts"
}]
}]
}],
"source": "internet"
}, {
"personal": {
"age": 24,
"gender": "F",
"name": {
"first": "Jane",
"last": "Roberts"
}
},
"vehicles": [{
"type": "car",
"model": "Toyota Camry",
"insurance": {
"policy_num": "TC98765"
},
"maintenance": [{
"desc": "tires rotated",
"cost": "389.50",
"summary": [{
"line1": "4 tires",
"line2": "leak repairs"
}]
}, {
"desc": "oil change",
"cost": "59.50",
"summary": [{
"line1": "0w20",
"line2": "special"
}]
}]
}, {
"type": "car",
"model": "Honda Accord",
"insurance": {
"policy_num": "HA98765"
},
"maintenance": [{
"desc": "new air filter",
"cost": "399.50",
"summary": [{
"line1": "17220-6B2-A00",
"line2": "rebuild assembly"
}]
}, {
"desc": "new brakes",
"cost": "799.50",
"summary": [{
"line1": "2-443344586",
"line2": "rear brake kit"
}]
}]
}],
"source": "phone"
}]
}
You want to denormalize the vehicle maintenance data and exclude the customers' personal information.
Perform the following steps to create and configure the target file:
  1. 1Step 1. Design the mapping.
  2. 2Step 2. Configure the output group.
  3. 3Step 3. Run the mapping.

Step 1. Design the mapping

The first step is to configure the Hierarchy Processor in the Mapping Designer with the Source and Target transformations.
Perform the following steps in the Mapping Designer:
  1. 1Add a Source transformation to the mapping with the shop maintenance file as a source object.
  2. 2Add a Hierarchy Processor transformation to the mapping and connect shop maintenance as an input source.
  3. 3In the Hierarchy Processor transformation, select Flattened for the output format.
  4. 4Add a Target transformation to the mapping, and connect the Hierarchy Processor transformation output to this target object.
Your mapping should look like the following image:
A mapping in the Mapping Designer shows the shop maintenance source connected to the Hierarchy Processor transformation input. The Hierarchy processor transformation is connected to the target transformation.

Step 2. Configure the output group

Once you configure the basic mapping, create an output group with the vehicle shop maintenance data.
Perform the following steps in the Hierarchy Processor transformation:
  1. 1Select the top-level input group. This automatically selects all the input fields and adds them to the output.
  2. 2Clear the personal struct. This automatically clears all the elements within the struct and removes them from the output.
  3. Your incoming and output fields should look like the following image: The Hierarchy Processor tab appears with Output Format Flattened selected. The Incoming Fields panel shows one input group with the following fields: people (array) selected; personal (struct) not selected; vehicles (array) selected. The vehicles array contains the type, model, insurance (struct), with policy_num field. The Output Fields panel contains one output group with the following string fields: type, model, policy_num, desc, cost, line1, line2.
  4. 3Click on the output field for summary line1 and rename it to "Summary_line1."
  5. 4Repeat the rename process for summary line2.
  6. The following image shows how you edit the field name in the Edit Output Field dialog box: The Edit Output Field dialog box shows the output field name changed to Summary_line1.
  7. 5Verify the mapping.

Step 3. Run the mapping

The final step is to create and run the mapping task to produce the JSON output.
Perform the following steps:
  1. 1Create a mapping task.
  2. 2Run the mapping task.
  3. 3Review your output.
The following table shows the partially denormalized target output after you run the mapping:
type
model
policy_num
desc
cost
Summary_line1
Summary_line2
source
car
Honda Civic
HA12345
oil change
111.5
0w20
synthetic
internet
car
Honda Civic
HA12345
oil change
111.5
2.0L 4-cyl
4.4 quarts
internet
car
Honda Civic
HA12345
new tires
425
235/40R18
4 tires
internet
car
Honda Civic
HA12345
new tires
425
All Season
No spare
internet
truck
Dodge Ram
DR12345
new tires
299.99
275/60R20
2 tires
internet
truck
Dodge Ram
DR12345
new tires
299.99
All Season
No spare
internet
truck
Dodge Ram
DR12345
oil change
111.5
5w30
conventional
internet
truck
Dodge Ram
DR12345
oil change
111.5
5.7L V8
7.0 quarts
internet
car
Toyota Camry
TC98765
tires rotated
389.5
4 tires
leak repairs
phone
car
Toyota Camry
TC98765
oil change
59.5
0w20
special
phone
car
Honda Accord
HA98765
new air filter
399.5
17220-6B2-A00
rebuild assembly
phone
car
Honda Accord
HA98765
new brakes
799.5
2-443344586
rear brake kit
phone