Hadoop Files V2 Connector > Mappings and tasks with Hadoop Files V2 Connector > Hadoop Files V2 sources in mappings
  

Hadoop Files V2 sources in mappings

To read data from a flat or complex file, configure a Hadoop Files V2 object as the Source transformation in a mapping. You can configure a Source transformation to represent a single flat or complex file source.
Specify the name and description of the Hadoop Files V2 source. Configure the source and advanced properties for the source object.
The following table describes the Hadoop Files V2 source properties that you can configure in a Source transformation:
Source Property
Description
Connection
Name of the source connection or create a connection parameter.
Source Type
Type of source object. Select Single or Parameter as the source type.
Object
Select the source object from which you want to read data. Though selecting a source object is mandatory, the agent ignores this object. The agent processes the source object specified in File Path in advanced source properties.
Format
File format of the source object.
You can select from the following file format types:
  • - None
  • - Flat file
  • - Avro
  • - Parquet
  • - JSON
Default is None. If you select None as the format type, the Secure Agent reads data in binary format. In advanced mode, None is not applicable.
Note: You cannot read data from ORC file format types even though they are listed in the Formatting Options.
Parameter
The parameter for the source object. Create or select the parameter for the source object.
Note: The parameter property appears if you select parameter as the source type.
The following table describes the Hadoop Files V2 source advanced properties that you can configure in a Source transformation:
Advanced Property
Description
File path
Mandatory. Location of the file or directory from which you want to read data. Maximum length is 255 characters. If the path is a directory, all the files in the directory must have the same file format.
If the file or directory is in HDFS, enter the path without the node URI. For example, /user/lib/testdir specifies the location of a directory in HDFS. The path must not contain more than 512 characters.
If the file or directory is in the local system, enter the fully qualified path. For example, /user/testdir specifies the location of a directory in the local system.
File Format
Mandatory. Name and format of the file from which you want to read data.
Specify the value in the following format: <filename>.<format>
For example, customer.avro
Allow Wildcard Characters
Indicates whether you want to use wildcard characters for the source directory name or the source file name.
If you select this option, you can use asterisk (*) wildcard character for the source directory name or the source file name in the File path field.
Allow Incremental Read
Indicates whether you want to read only the newly added files or files that have changed since the last time the mapping task ran.
Allow Recursive Read
Indicates whether you want to use wildcard characters to read complex files of the Parquet, Avro, or JSON formats recursively from the specified folder and its subfolders and files.
You can use the wildcard character as part of the file or directory. For example, you can use a wildcard character to recursively read data from the following folders:
  • - /myfolder*/. Returns all files within any folder or subfolder that has a pattern myfolder in the path name.
  • - /myfolder*/*.csv. Returns all .csv files within any folder or subfolder that has a pattern myfolder in the path name.
  • - /myfolder*/ and file name is abc*. Returns all files that have a pattern abc within any folder or subfolder that has a pattern myfolder in the path name.
File Format
Specifies a file format of a complex file source. Select one of the following options:
  • - Binary
  • - Custom Input
  • - Sequence File Format
Default is Binary.
Input Format
The class name for files of the input file format. If you select input file format in the File Format field, you must specify the fully qualified class name implementing the InputFormat interface.
To read files that use the Avro format, use the following input format:
com.informatica.avro.AvroToXML
Input Format Parameters
Parameters for the input format class. Enter name-value pairs separated with a semicolon. Enclose the parameter name and value within double quotes.
For example, use the following syntax:
"param1"="value1";"param2"="value2"
Compression Format
Compression format of the source files.
Select one of the following options:
  • - None
  • - Auto
  • - DEFLATE
  • - gzip
  • - bzip2
  • - Lzo
  • - Snappy
  • - Custom
Custom Compression Codec
Required if you use custom compression format. Specify the fully qualified class name implementing the CompressionCodec interface.
isSchemaValidationSupported
Determines whether you want to validate data against a predefined schema to ensure the data structure and types conform to expected formats.
Select this option to validate the incoming data. Do not select this option to read without any schema validation checks.
Tracing Level
Sets the amount of detail that appears in the log file.
You can choose terse, normal, verbose initialization, or verbose data. Default is normal.