You can use a Source transformation in advanced mode to read hierarchical data from complex files, such as Avro, JSON, and Parquet files. Advanced mode represents the data as an array, map, or struct.
To read hierarchical data, set the format on the Source tab to a hierarchical format, such as JSON, or to Discover Structure. Use Discover Structure when you want to use an intelligent structure model to define the structure of your data. For more information, see Using intelligent structure models in mappings in advanced mode.
Downstream in the mapping, you can use the hierarchical fields as pass-through fields to convert data from one complex file format to another. For example, you can read hierarchical data from an Avro source and write the data to a JSON target. You can also use the hierarchical fields and their child fields in expressions and conditions in downstream transformations. For information about accessing child fields, see the Function Reference.
You can pass hierarchical fields to the following transformations:
•Target
•Aggregator
•Expression
•Filter
•Hierarchy Processor
•Joiner
•Rank
•Router
•Sequence Generator
•Sorter
Rules and guidelines for reading hierarchical data
Consider the following guidelines when you read hierarchical data:
•You must use an Amazon S3 V2 or Azure Data Lake Storage Gen2 connection to read hierarchical data. For more information, see the help for the appropriate connector.
•To read data from an XML source, use an intelligent structure model in the Source transformation. For information about intelligent structure models, see Components.
•You cannot use a parameter for the source connection or the source object.
•If hierarchical fields contain child fields with decimal data types, the mapping runs using low precision.
•The transformation sets the precision and scale based on the values in the first row of data. Note that this first row is sometimes referred to as row 0.
•To avoid data truncation, increase the precision and scale in the first row of data. Also ensure that the first row does not include null values.