Microsoft Azure Data Lake Storage Gen2 Connector > Mappings for Microsoft Azure Data Lake Storage Gen2 > Microsoft Azure Data Lake Storage Gen2 targets in mappings
  

Microsoft Azure Data Lake Storage Gen2 targets in mappings

In a mapping, you can use a Microsoft Azure Data Lake Storage Gen2 object as a target.
When you use Microsoft Azure Data Lake Storage Gen2 target objects, you can select a Microsoft Azure Data Lake Storage Gen2 Gen2 collection as target. You can configure Microsoft Azure Data Lake Storage Gen2 target properties on the Target page of the Mapping wizard. When you write data to Microsoft Azure Data Lake Storage Gen2, you can use the create target field to create a target at run time. When you create a new target based on the source, you must remove all the binary fields from the field mapping.
The following table describes the Microsoft Azure Data Lake Storage Gen2 target properties that you can configure in a Target transformation:
Property
Description
Connection
Name of the target connection. Select a target connection or click New Parameter to define a new parameter for the target connection.
If you want to overwrite the parameter at runtime, select the Allow parameter to be overridden at run time option when you create a parameter. When the task runs, the agent uses the parameters from the file that you specify in the task advanced session properties.
When you switch between a non-parameterized and a parameterized Microsoft Azure Data Lake Storage Gen2 connection, the advanced property values are retained.
Target Type
Select Single Object or Parameter.
Object
Name of the target object. You can select an existing object or create a new target at runtime.
When you select Create New at Runtime, enter a name for the target object and select the source fields that you want to use. By default, all source fields are used.
The target name can contain alphanumeric characters. You can use only a period (.), an underscore (_), an at the rate sign (@), a dollar sign ($), and a percentage sign (%) special characters in the file name.
Ensure that the headers or file data does not contain special characters.
You can use parameters defined in a parameter file in the target name. When you select the Create Target option, you cannot parameterize the target at runtime.
Note: When you write data to a flat file created at runtime, the target flat file contains a blank line at the end of the file.
Parameter
Select an existing parameter for the target object or click New Parameter to define a new parameter for the target object.
The Parameter property appears only if you select Parameter as the target type.
When you parameterize the target object, specify the complete object path including the file system in the default value of the parameter.
If you want to overwrite the parameter at runtime, select the Allow parameter to be overridden at run time option when you create a parameter. When the task runs, the agent uses the parameters from the file that you specify in the task advanced session properties. Ensure that the parameter file is in the correct format.
Format
The file format that the Microsoft Azure Data Lake Storage Gen2 Connector uses to write data to Microsoft Azure Data Lake Storage Gen2.
You can select from the following file format types:
Default is None.
If you select None as the format type, Microsoft Azure Data Lake Storage Gen2 Connector writes data to Microsoft Azure Data Lake Storage Gen2 files in binary format.
When you write binary data to a Microsoft Azure Data Lake Storage Gen2 target created at run time, ensure that the incoming field that contains the binary data is named data. If the field name differs, use field rules or an Expression transformation to change the field name to data.
For more information, see File formatting options
Operation
The target operation. Select Insert to insert data to a Microsoft Azure Data Lake Storage Gen2 target.
Note: When you use the Create Target option and specify an object name with extension that does not match the Format Type under Formatting Options, the Secure Agent ignores the format type you specified under Formatting Options.
For example, if you select Parquet format type and specify customer.avro in the object name in the Target Object dialog box, the Secure Agent ignores Parquet and creates an Avro target file.
The following table describes the advanced target properties for Microsoft Azure Data Lake Storage Gen2:
Advanced Target Property
Description
Concurrent Threads1
Number of concurrent connections to load data from the Microsoft Azure Data Lake Storage Gen2. When writing a large file, you can spawn multiple threads to process data. Configure Block Size to divide a large file into smaller parts.
Default is 4. Maximum is 10.
Filesystem Name Override
Overrides the default file name.
Directory Override
Microsoft Azure Data Lake Storage Gen2 directory that you use to write data. Default is root directory. The Secure Agent creates the directory if it does not exist. The directory path specified at run time overrides the path specified while creating a connection.
You can specify an absolute or a relative directory path:
  • - Absolute path - The Secure Agent searches this directory path in the specified file system.
  • Example of absolute path: Dir1/Dir2
  • - Relative path - The Secure Agent searches this directory path in the native directory path of the object.
  • Example of relative path: /Dir1/Dir2
    When you use the relative path, the imported object path is added to the file path used during the metadata fetch at runtime.
Do not specify a root directory (/) to override the directory.
File Name Override
Target object. Select the file from which you want to write data. The file specified at run time overrides the file specified in Object.
Write Strategy
Applicable to complex and flat files.
When you create a mapping, you can use the overwrite and append write strategy for flat files. However, you can use only the overwrite strategy for complex files.
When you create a mapping in advanced mode, you can use the overwrite and append write strategy for both flat files and complex files.
When you create a new target at runtime and use the append strategy, the mapping creates a new target file and writes the data to the file. The mapping appends data in subsequent runs.
When you append data for mappings in advanced mode, the data is appended as a new part file in the existing target directory.
The maximum size of data that you can append is 450 MB.
Default is overwrite.
Block Size1
Applicable to flat, Avro, and Parquet file formats. Divides a large file into smaller specified block size. When you write a large file, divide the file into smaller parts and configure concurrent connections to spawn the required number of threads to process data in parallel.
Specify an integer value for the block size.
Default value in bytes is 8388608.
Compression Format
Compresses and writes data to the target based on the format you specify.
Select one of the following options:
  • - None. Select to write Avro, ORC, Parquet, and Delta files that use Snappy compression.
  • You cannot write compressed JSON files.
  • - Gzip. Select to write flat files, Parquet, and Delta files that use Gzip compression.
When the task runs, the file extensions .gz or .snappy do not appear in target object name.
Timeout Interval
Not applicable.
Interim Directory1
Optional. Applicable to flat files and JSON files.
Path to the staging directory in the Secure Agent machine.
Specify the staging directory where you want to stage the files when you write data to Microsoft Azure Data Lake Storage Gen2. Ensure that the directory has sufficient space and you have write permissions to the directory.
Default staging directory is /tmp.
You cannot specify an interim directory for mappings in advanced mode.
You cannot specify an interim directory when you use the Hosted Agent.
Forward Rejected Rows1
Configure the transformation to either pass rejected rows to the next transformation or drop them.
1Doesn't apply to mappings in advanced mode.

Specifying a target

You can use an existing target or create a target to hold the results of a mapping. If you choose to create the target, the agent creates the target when you run the task.
To specify the target properties, follow these steps:
    1Select the Target transformation in the mapping.
    2On the Incoming Fields tab, configure field rules to specify the fields to include in the target.
    3To specify the target, click the Target tab.
    4Select the target connection.
    5For the target type, choose Single Object or Parameter.
    6 Specify the target object or parameter.
    Note: The Handle Special Characters option is not applicable to mappings in advanced mode.
    7Click Formatting Options if you want to configure the formatting options for the file, and click OK.
    8Click Select and choose a target object. You can select an existing target object or create a new target object at run time and specify the object name.
    9Specify Advanced properties for the target, if needed.

Target time stamps

When you create a target at run time in a mapping, you can append time stamp information to the file name to show when the file is created.
When you specify the file name for the target file, include special characters based on Linux STRFTIME function formats that the mapping task uses to include time stamp information in the file name. The time stamp is based on the organization's time zone.
You cannot append time stamp information to the file name for mappings in advanced mode.
The following table describes some common STRFTIME function formats that you might use in a mapping or mapping task:
Special Character
Description
%d
Day as a two-decimal number, with a range of 01-31.
%m
Month as a two-decimal number, with a range of 01-12.
%y
Year as a two-decimal number without the century, with range of 00-99.
%Y
Year including the century, for example 2015.
%T
Applicable only to flat files. Time in 24-hour notation, equivalent to %H:%:M:%S.
%H
Hour in 24-hour clock notation, with a range of 00-24.
%l
Hour in 12-hour clock notation, with a range of 01-12.
%M
Minute as a decimal, with a range of 00-59.
%S
Second as a decimal, with a range of 00-60.
%p
Either AM or PM.
Note: For complex files, instead of %T you can use the equivalent %H_%M_%S.

Target partitioning

You can configure partitioning to optimize the mapping performance at run time when you write data to Microsoft Azure Data Lake Storage Gen2. You can configure target partitioning only in mappings.
The partition type controls how the agent distributes data among partitions at partition points. With partitioning, the Secure Agent distributes rows of target data based on the number of threads that you define as partition.
For example, if there are three partitions in the source, the Secure Agent writes separate files for each partition in the Microsoft Azure Data Lake Storage Gen2 target in the following format:
<target>
<target_1>
<target_2>
Consider the following rules and guidelines for target partitioning: