You can select the format of the Hadoop File and configure the formatting options when you read from or write to flat or complex files on the local system or to HDFS. You can use Avro, Parquet, JSON, ORC, and flat file format to read from or write data.
The following table describes the formatting options for Avro, Parquet, JSON, ORC, and delimited flat files in the Source and Target transformations:
Property
Description
Schema Source
The schema of the source or target file.
Select one of the following options to specify a schema:
- Read from data file. Imports the schema from a file in Hadoop Files.
- Import from schema file. Imports the schema from a schema definition file in the agent machine.
Schema File
The schema definition file in the agent machine from where you want to upload the schema.
You cannot upload a schema file when you create a target at runtime.
The Schema File option appears if you select Import from schema file.
Read and write flat file formats
You can read from or write to delimited flat files. However, you cannot use Hadoop Files V2 Connector to read or write from fixed width flat files.
You need to enable staging before you can read or write to delimited flat files, When you set the staging property, Data Integration, by default, creates a flat file locally in a temporary folder to stage the data before it read from or writes to delimited flat files.
Enable staging to use flat file formats
You can set Data Integration to use staging for read, lookup, and write operations that involve data with delimited flat file formats.
If you do not set the staging property, Data Integration does not perform staging and fails to read from or write to a flat file. This property is specific to flat files and has no impact while processing other data formats.
Staging properties
You can set staging for read, lookup, and write operations.
You need to set the following staging property in the Secure Agent properties based on the operation that you want to configure:
Operation
Property to set
Read operation
INFA_DTM_RDR_STAGING_ENABLED_CONNECTORS
Lookup operation
INFA_DTM_LKP_STAGING_ENABLED_CONNECTORS
Write operation
INFA_DTM_STAGING_ENABLED_CONNECTORS
Set the staging property
Perform the following tasks to set the staging property for the Tomcat in the Secure Agent properties:
1In Administrator, click Runtime Environments.
2Edit the Secure Agent for which you want to set the property.
3In the System Configuration Details section, select the Service as Data Integration Server and the type as Tomcat.
4Set the value of the Tomcat property to the plugin ID of Hadoop Files V2 Connector.
You can find the plugin ID in the manifest file located in the following directory: <Secure Agent installation directory>/downloads/<Hadoop Files V2 package>/CCIManifest.txt
When you run the mapping, a temporary flat file is created in the following directory in your machine based on the operation for which you set the property:
To use a temporary directory, add the following property and required path value to the JVMOptions of the Secure Agent: "-Dhadoopfiles.tmpdir=/home/tmp/". The Secure Agent uses this designated location to create temporary files. If you do not provide a location, the Secure Agent uses the Java's temporary directory.
Data Integration logs the following messages in the session log if staging is done successfully through the flat file:
•Read operation: Staging mode is enabled to read data.
•Lookup operation: Staging mode is enabled to read data.
•Write operation: The INFA_DTM_STAGING is successfully enabled to use the flat file to create local staging files.
For rules and guidelines that applies for flat file formats, see Flat file formats.
Flat file formatting options
The following table describes the formatting options for flat files:
Property
Description
Flat File Type
The type of flat file.
Select the Delimited option to reads a flat file that contains column delimiters.
The fixed width option doesn't apply for Hadoop Files.
Delimiter
Character used to separate columns of data in a delimited flat file. You can set values as comma, tab, colon, semicolon, or others.
You cannot set a tab as a delimiter directly in the Delimiter field. To set a tab as a delimiter, you must type the tab character in any text editor. Then, copy and paste the tab character in the Delimiter field.
EscapeChar
Character immediately preceding a column delimiter character embedded in an unquoted string, or immediately preceding the quote character in a quoted string data in a delimited flat file.
When you write data to Hadoop Files and specify a qualifier, by default, the qualifier is considered as the escape character. Else, the character specified as the escape character is considered.
Qualifier
Quote character that defines the boundaries of data in a delimited flat file. You can set qualifier as single quote or double quote.
Qualifier Mode
The qualifier behavior when you write data to a delimited flat file. It determines whether you want data available in the staging file to be enclosed in double quotes.
The following qualifier modes display:
- MINIMAL
- ALL
- NON_NUMERIC
- ALL_NON_NULL
You need to select ALL to enclose all entries in double quotes. The rest of the options doesn't apply for Hadoop Files.
Disable escape character when a qualifier is set
Not applicable.
Code Page
Select the code page that the Secure Agent must use to read or write data to a delimited flat file.
Select UTF-8 for mappings.
Select one of the following options for mappings in advanced mode:
- UTF-8
- MS Windows Latin 1
- Shift-JIS
- ISO 8859-15 Latin 9 (Western European)
- ISO 8859-3 Southeast European
- ISO 8859-5 Cyrillic
- ISO 8859-9 Latin 5 (Turkish)
- IBM EBCDIC International Latin-1
Header Line Number
Not applicable.
First Data Row
Specify the line number from where you want the Secure Agent to read data in a delimited flat file. You must enter a value that is greater or equal to one.
To read data from the header, the value of the Header Line Number and the First Data Row fields should be the same. Default is 1.
Target Header
Select whether you want to write data to a target that contains a header or without a header in the delimited flat file. You can select With Header or Without Header options.
This property is not applicable when you read data from a Hadoop Files source.
Distribution Column
Not applicable.
Max Rows To Preview
Not applicable.
Row Delimiter
Not applicable.
Rules and guidelines for file formats
You must set the appropriate source and target properties when you select the file format types.
File path
Consider the following general guidelines when you specify the file path property:
•The Object field in the source properties from which you want to read data should be same as the source object specified in the File path field in advanced source properties.
•If you write data to a complex file target and the File Path field is mapped to any of the source fields, the file name in the Object field in the target properties must be different from the target File Name specified in the advanced target properties.
Create target
When you configure a Target transformation to create a new complex file target, consider the following guidelines:
•You can configure the format options and specify the file format type of the target object. If you select None as the format type, the Secure Agent writes data in binary format. Default is None. You cannot write data to ORC file format types even though it are listed in the Formatting Options.
•You cannot write data to a complex file object in Avro, Parquet, or JSON file format using the Create Target option, if one of the field names is FilePath in the source schema.
Flat file format
Consider the following guidelines when you enable the staging property to read from or write to delimited flat files:
•To read and write Null values as Null, add ReadNullAsNullCSVStaging property for the read operation and WriteNullAsNullCSVStaging for the write operation in the Optional Connection Properties in the Hadoop Files connection.
•You can configure only the following advanced properties in the Source transformation to read from delimited flat files:
- File path
- File pattern
- Allow recursive read
- Allow wildcard characters
•You can configure only the file directory, file name, and overwrite target fields in the advanced properties in the Target transformation to write to delimited flat files
JSON format
Consider the following guidelines for JSON files:
•You cannot read and write nested and multi-line indented JSON files.
You can use the following JSON file structure to read data from and write data to a complex file object in JSON file format: