Big Data Management User Guide > Mapping Sources in the Hadoop Environment > Complex File Sources
  

Complex File Sources

A mapping that runs in the Hadoop environment can process complex files.
You can read files from the local file system or from HDFS. To read large volumes of data, you can connect a complex file source to read data from a directory of files that have the same format and properties. You can read compressed binary files.
A mapping that runs on the Blaze engine or the Hive engine can contain a Data Processor transformation. You can include a complex file data object without a Data Processor transformation to read complex files that are flat files. If the complex file is a hierarchical file, you must connect the complex file data object to a Data Processor transformation.
A mapping that runs on the Spark engine can process hierarchical data through complex data types. Use a complex file data object that represents the complex files in the Hadoop Distributed File System. If the complex file contains hierarchical data, you must enable the read operation to project columns as complex data types.
The following table shows the complex files that a mapping can process in the Hadoop environment:
File Type
Format
Blaze Engine
Spark Engine
Hive Engine
Avro
Flat
Supported
Supported
Supported
Avro
Hierarchical
Supported*
Supported**
Supported*
JSON
Flat
Supported*
Supported
Supported*
JSON
Hierarchical
Supported*
Supported**
Supported*
ORC
Flat
Not supported
Supported
Not supported
ORC
Hierarchical
Not supported
Not supported
Not supported
Parquet
Flat
Supported
Supported
Supported
Parquet
Hierarchical
Supported*
Supported**
Supported*
XML
Flat
Supported*
Not supported
Supported*
XML
Hierarchical
Supported*
Not supported
Supported*
* The complex file data object must be connected to a Data Processor transformation.
** The complex file read operation must be enabled to project columns as complex data type.