Hadoop Files V2 Connector > Mappings and tasks with Hadoop Files V2 Connector > Hadoop Files formatting options
  

Hadoop Files formatting options

You can select the format of the Hadoop File and configure the formatting options when you read from or write to flat or complex files on the local system or to HDFS. You can use Avro, Parquet, JSON, ORC, and flat file format to read from or write data.
The following table describes the formatting options for Avro, Parquet, JSON, ORC, and delimited flat files in the Source and Target transformations:
Property
Description
Schema Source
The schema of the source or target file.
Select one of the following options to specify a schema:
  • - Read from data file. Imports the schema from a file in Hadoop Files.
  • - Import from schema file. Imports the schema from a schema definition file in the agent machine.
Schema File
The schema definition file in the agent machine from where you want to upload the schema.
You cannot upload a schema file when you create a target at runtime.
The Schema File option appears if you select Import from schema file.

Read and write flat file formats

You can read from or write to delimited flat files. However, you cannot use Hadoop Files V2 Connector to read or write from fixed width flat files.
You need to enable staging before you can read or write to delimited flat files, When you set the staging property, Data Integration, by default, creates a flat file locally in a temporary folder to stage the data before it read from or writes to delimited flat files.

Enable staging to use flat file formats

You can set Data Integration to use staging for read, lookup, and write operations that involve data with delimited flat file formats.
If you do not set the staging property, Data Integration does not perform staging and fails to read from or write to a flat file. This property is specific to flat files and has no impact while processing other data formats.

Staging properties

You can set staging for read, lookup, and write operations.
You need to set the following staging property in the Secure Agent properties based on the operation that you want to configure:
Operation
Property to set
Read operation
INFA_DTM_RDR_STAGING_ENABLED_CONNECTORS
Lookup operation
INFA_DTM_LKP_STAGING_ENABLED_CONNECTORS
Write operation
INFA_DTM_STAGING_ENABLED_CONNECTORS

Set the staging property

Perform the following tasks to set the staging property for the Tomcat in the Secure Agent properties:
  1. 1In Administrator, click Runtime Environments.
  2. 2Edit the Secure Agent for which you want to set the property.
  3. 3In the System Configuration Details section, select the Service as Data Integration Server and the type as Tomcat.
  4. 4Set the value of the Tomcat property to the plugin ID of Hadoop Files V2 Connector.
  5. You can find the plugin ID in the manifest file located in the following directory: <Secure Agent installation directory>/downloads/<Hadoop Files V2 package>/CCIManifest.txt
When you run the mapping, a temporary flat file is created in the following directory in your machine based on the operation for which you set the property:
To use a temporary directory, add the following property and required path value to the JVMOptions of the Secure Agent: "-Dhadoopfiles.tmpdir=/home/tmp/". The Secure Agent uses this designated location to create temporary files. If you do not provide a location, the Secure Agent uses the Java's temporary directory.
Data Integration logs the following messages in the session log if staging is done successfully through the flat file:
For rules and guidelines that applies for flat file formats, see Flat file formats.

Flat file formatting options

The following table describes the formatting options for flat files:
Property
Description
Flat File Type
The type of flat file.
Select the Delimited option to reads a flat file that contains column delimiters.
The fixed width option doesn't apply for Hadoop Files.
Delimiter
Character used to separate columns of data in a delimited flat file. You can set values as comma, tab, colon, semicolon, or others.
You cannot set a tab as a delimiter directly in the Delimiter field. To set a tab as a delimiter, you must type the tab character in any text editor. Then, copy and paste the tab character in the Delimiter field.
EscapeChar
Character immediately preceding a column delimiter character embedded in an unquoted string, or immediately preceding the quote character in a quoted string data in a delimited flat file.
When you write data to Hadoop Files and specify a qualifier, by default, the qualifier is considered as the escape character. Else, the character specified as the escape character is considered.
Qualifier
Quote character that defines the boundaries of data in a delimited flat file. You can set qualifier as single quote or double quote.
Qualifier Mode
The qualifier behavior when you write data to a delimited flat file. It determines whether you want data available in the staging file to be enclosed in double quotes.
The following qualifier modes display:
  • - MINIMAL
  • - ALL
  • - NON_NUMERIC
  • - ALL_NON_NULL
You need to select ALL to enclose all entries in double quotes. The rest of the options doesn't apply for Hadoop Files.
Disable escape character when a qualifier is set
Not applicable.
Code Page
Select the code page that the Secure Agent must use to read or write data to a delimited flat file.
Select UTF-8 for mappings.
Select one of the following options for mappings in advanced mode:
  • - UTF-8
  • - MS Windows Latin 1
  • - Shift-JIS
  • - ISO 8859-15 Latin 9 (Western European)
  • - ISO 8859-3 Southeast European
  • - ISO 8859-5 Cyrillic
  • - ISO 8859-9 Latin 5 (Turkish)
  • - IBM EBCDIC International Latin-1
Header Line Number
Not applicable.
First Data Row
Specify the line number from where you want the Secure Agent to read data in a delimited flat file. You must enter a value that is greater or equal to one.
To read data from the header, the value of the Header Line Number and the First Data Row fields should be the same. Default is 1.
Target Header
Select whether you want to write data to a target that contains a header or without a header in the delimited flat file. You can select With Header or Without Header options.
This property is not applicable when you read data from a Hadoop Files source.
Distribution Column
Not applicable.
Max Rows To Preview
Not applicable.
Row Delimiter
Not applicable.

Rules and guidelines for file formats

You must set the appropriate source and target properties when you select the file format types.

File path

Consider the following general guidelines when you specify the file path property:

Create target

When you configure a Target transformation to create a new complex file target, consider the following guidelines:

Flat file format

Consider the following guidelines when you enable the staging property to read from or write to delimited flat files:

JSON format

Consider the following guidelines for JSON files: