Complex File Sources

You can read files from the local file system or from HDFS. To read large volumes of data, you can connect a complex file source to read data from a directory of files that have the same format and properties. You can read compressed binary files.

A mapping that runs on the Blaze engine or the Hive engine can contain a Data Processor transformation. You can include a complex file data object without a Data Processor transformation to read complex files that are flat files. If the complex file is a hierarchical file, you must connect the complex file data object to a Data Processor transformation.

A mapping that runs on the Spark engine can process hierarchical data through complex data types. Use a complex file data object that represents the complex files in the Hadoop Distributed File System. If the complex file contains hierarchical data, you must enable the read operation to project columns as complex data types.

File Type	Format	Blaze Engine	Spark Engine	Hive Engine
Avro	Flat	Supported	Supported	Supported
Avro	Hierarchical	Supported*	Supported**	Supported*
JSON	Flat	Supported*	Supported	Supported*
JSON	Hierarchical	Supported*	Supported**	Supported*
ORC	Flat	Not supported	Supported	Not supported
ORC	Hierarchical	Not supported	Not supported	Not supported
Parquet	Flat	Supported	Supported	Supported
Parquet	Hierarchical	Supported*	Supported**	Supported*
XML	Flat	Supported*	Not supported	Supported*
XML	Hierarchical	Supported*	Not supported	Supported*
* The complex file data object must be connected to a Data Processor transformation. ** The complex file read operation must be enabled to project columns as complex data type.