Operation | Support |
---|---|
Read | Yes |
Write | Yes |
Connection property | Description |
---|---|
Connection Name | Name of the connection. Each connection name must be unique within the organization. Connection names can contain alphanumeric characters, spaces, and the following special characters: _ . + -, Maximum length is 255 characters. |
User Name | Required to read data from HDFS. Enter a user name that has access to the single-node HDFS location to read data from or write data to. |
NameNode URI | The URI to access HDFS. Use the following format to specify the name node URI in Cloudera, Amazon EMR, and Hortonworks distributions: hdfs://<namenode>:<port>/ Where
If the Hadoop cluster is configured for high availability, you must copy the fs.defaultFS value in the core-site.xml file and append / to specify the name node URI. For example, the following snippet shows the fs.defaultFS value in a sample core-site.xml file: <property> <name>fs.defaultFS</name> <value>hdfs://nameservice1</value> <source>core-site.xml</source> </property> In the above snippet, the fs.defaultFS value is hdfs://nameservice1 and the corresponding name node URI is hdfs://nameservice1/ Note: Specify either the name node URI or the local path. Do not specify the name node URI if you want to read data from or write data to a local file system path. |
Local Path | A local file system path to read and write data. Read the following conditions to specify the local path:
For example, /user/testdir specifies the location of a directory in the local system. Default value for Local Path is NA. |
Configuration Files Path | The directory that contains the Hadoop configuration files. Note: Copy the core-site.xml, hdfs-site.xml, and hive-site.xmlfrom the Hadoop cluster and add them to a folder in Linux Box. |
Keytab File | The file that contains encrypted keys and Kerberos principals to authenticate the machine. |
Principal Name | Users assigned to the superuser privilege can perform all the tasks that a user with the administrator privilege can perform. |
Impersonation Username | You can enable different users to run jobs in a Hadoop cluster that uses Kerberos authentication or connect to sources and targets that use Kerberos authentication. To enable different users to run jobs or connect to big data sources and targets, you must configure user impersonation. |
Advanced Property | Description |
---|---|
File path | Mandatory. Location of the file or directory from which you want to read data. Maximum length is 255 characters. If the path is a directory, all the files in the directory must have the same file format. If the file or directory is in HDFS, enter the path without the node URI. For example, /user/lib/testdir specifies the location of a directory in HDFS. The path must not contain more than 512 characters. If the file or directory is in the local system, enter the fully qualified path. For example, /user/testdir specifies the location of a directory in the local system. |
File Pattern | Mandatory. Name and format of the file from which you want to read data. Specify the value in the following format: <filename>.<format> For example, customer.avro |
Allow Wildcard Characters | Indicates whether you want to use wildcard characters for the source directory name or the source file name. If you select this option, you can use asterisk (*) wildcard character for the source directory name or the source file name in the File path field. |
Allow Recursive Read | Indicates whether you want to use wildcard characters to read complex files of the Parquet, Avro, or JSON formats recursively from the specified folder and its subfolders and files. You can use the wildcard character as part of the file or directory. For example, you can use a wildcard character to recursively read data from the following folders:
|
File Format | Specifies a file format of a complex file source. Select one of the following options:
Default is Binary. |
Input Format | The class name for files of the input file format. If you select input file format in the File Format field, you must specify the fully qualified class name implementing the InputFormat interface. To read files that use the Avro format, use the following input format: com.informatica.avro.AvroToXML |
Input Format Parameters | Parameters for the input format class. Enter name-value pairs separated with a semicolon. Enclose the parameter name and value within double quotes. For example, use the following syntax: "param1"="value1";"param2"="value2" |
Compression Format | Compression format of the source files. Select one of the following options:
|
Custom Compression Codec | Required if you use custom compression format. Specify the fully qualified class name implementing the CompressionCodec interface. |
Advanced Property | Description |
---|---|
File Directory | Optional. The directory location of one or more output files. Maximum length is 255 characters. If you do not specify a directory location, the output files are created at the location specified in the connection. If the directory is in HDFS, enter the path without the node URI. For example, /user/lib/testdir specifies the location of a directory in HDFS. The path must not contain more than 512 characters. If the file or directory is in the local system, enter the fully qualified path. For example, /user/testdir specifies the location of a directory in the local system. |
File Name | Optional. Renames the output file. The file name is not applicable when you read or write multiple Hadoop Files V2s. |
Overwrite Target | Indicates whether the Secure Agent must first delete the target data before writing data. If you select the Overwrite Target option, the Secure Agent deletes the target data before writing data. If you do not select this option, the Secure Agent creates a new file in the target and writes the data to the file. |
File Format | Specifies a file format of a complex file source. Select one of the following options:
Default is Binary. |
Output Format | The class name for files of the output format. If you select Output Format in the File Format field, you must specify the fully qualified class name implementing the OutputFormat interface. |
Output Key Class | The class name for the output key. If you select Output Format in the File Format field, you must specify the fully qualified class name for the output key. You can specify one of the following output key classes:
Note: Hadoop Files V2 generates the key in ascending order. |
Output Value Class | The class name for the output value. If you select Output Format in the File Format field, you must specify the fully qualified class name for the output value. You can use any custom writable class that Hadoop supports. Determine the output value class based on the type of data that you want to write. Note: When you use custom output formats, the value part of the data that is streamed to the complex file data object write operation must be in a serialized form. |
Compression Format | Compression format of the source files. Select one of the following options:
|
Custom Compression Codec | Required if you use custom compression format. Specify the fully qualified class name implementing the CompressionCodec interface. |
Sequence File Compression Type | Optional. The compression format for sequence files. Select one of the following options:
|