Data Accelerator for Azure > Azure data sync task configuration > Configuring Azure data sync tasks
  

Configuring Azure data sync tasks

Configure an Azure data sync task to read and synchronize data between a source and Microsoft Azure SQL Data Warehouse.
To configure an Azure data sync task, click New on the navigation bar, and then complete the following steps:
  1. 1. Configure the source.
  2. 2. Configure the target.
  3. 3. Configure field mappings.
  4. 4. Save and run the Azure data sync task.
As you configure the task, you can click Save to save your work at any time. When you finish configuring the Azure data sync task, you can save and run the task or you can run it later from the Explore page. If you want to edit a task or complete an unfinished task, you can access the task on the Explore page.

Configuring the source

Configure the source object that you want to write to Azure Data Warehouse. If you don't want to write all of the data, you can configure data filters to specify which fields to include or exclude.
When you select an object, the Data Preview area shows the first ten rows of the first eight columns in the object. It also displays the total number of columns in the object. To view all of the columns, you can download a file that displays all columns and the first ten rows. If the page has more than one object, you can select the object in the Data Preview area to display its data.
The Data Preview area does not display certain Unicode characters. If the data contains binary data, the Data Preview area shows the following text:
BINARY DATA
    1. On the Source page, select a connection.
    2. Select the source object.
    3. To specify advanced properties, click Advanced.
    Note that not all connection types have advanced connection properties.
    4. Optionally, create a data filter to reduce the number of source rows that the Azure data sync task reads.
    To create a row in the data filter table, click the Add icon.
    5. Optionally, to view all of the source columns, click Download to View All Columns.
    The file shows the first ten rows of the source.
    6. Click Next.

Microsoft Azure advanced source connection properties

When you select a Microsoft Azure data warehouse connection as the source, you can configure advanced Microsoft Azure connection properties.
The following table describes advanced Microsoft Azure connection properties:
Property
Description
Azure Blob Container Name
Name of the container in Microsoft Azure Storage to use for staging before extracting data from the Microsoft Azure data warehouse.
Field Delimiter
Character used to separate fields in the file. Default is 0x1e. You can select the following field delimiters from the list:
~ ` | . TAB 0x1e
Number of concurrent connections to Blob Store
Number of concurrent connections to extract data to the staging area in Microsoft Azure Blob Storage. Default is 4.
Pre SQL
Pre-SQL command that must be run before reading data from the source.
Post SQL
Post-SQL command that must be run after writing data to the target.
SQL Override
When you read data from a Microsoft Azure data warehouse object, you can configure SQL overrides and define constraints
On Pre-Post SQL Error
Determines the behavior when a task that includes pre-SQL or post-SQL commands encounters errors. You can select any of the following options:
  • - Continue. The task continues regardless of errors.
  • - Stop. The task stops when errors occur while executing pre-SQL or post-SQL commands.
Quote Character
The task skips the specified character when it reads data from the Microsoft Azure data warehouse.

Netezza advanced source connection properties

When you select a Netezza connection for the source, you can configure advanced Netezza connection properties.
The following table describes advanced Netezza connection properties:
Property
Description
Pipe Directory Path
Path for the task to create the pipe for the external table.
Delimiter
Delimiter that separates successive input fields.
You can enter any value supported by the Netezza Performance Server. The value can be a part of the data for the Netezza source. Default is |.
Null Value
NullValue parameter of an external table.
The task uses the NullValue internally. Maximum value is one character. Default is blank.
Escape Character
Escape character of an external table.
If the data contains NULL, CR, and LF characters in the Char or Varchar field, add an escape character in the source data before you extract. Enter an escape character before the data.
The supported escape character is backslash (\).
Socket Buffer Size
The socket buffer size that you must set to increase performance.
Set from 25 to 50% of the DTM buffer size. You might need to test different settings for optimal performance. Enter a value between 4096 and 2147483648 bytes.
Default is 8388608 bytes.
SQL Query Override
Overrides the default query.
Enter the SQL query that the task must use to query data from the Netezza source.
Owner Name
The Netezza schema name.
Source Table Name
Overrides the source table name. For example, you can parameterize the source table name to override the table name in the task.
Pre SQL
Pre-SQL command that must be run before reading data from the source.
Post SQL
Post-SQL command that must be run after writing data to the target.

Teradata advanced source connection properties

When you select a Teradata connection as the source, you can configure advanced Teradata connection properties.
The following table describes advanced Teradata connection properties:
Property
Description
Driver Tracing Level
Determines Teradata PT API tracing at the driver level:
  • - TD_OFF. Teradata PT API disables tracing.
  • - TD_OPER. Teradata PT API enables tracing for driver-specific activities for Teradata.
  • - TD_OPER_CLI. Teradata PT API enables tracing for activites involving CLIv2.
  • - TD_OPER_NOTIFY. Teradata PT API enables tracing for activities involving the Notify feature.
  • - TD_OPER_OPCOMMON. Teradata PT API enables tracing for activities involving the operator common library.
  • - TD_OPER_ALL. Teradata PT API enables all driver-level tracing.
Default is TD_OFF.
Infrastructure Tracing Level
Determines Teradata PT API tracing at the infrastructure level:
  • - TD_OFF. Teradata PT API disables tracing.
  • - TD_OPER. Teradata PT API enables tracing for driver-specific activities for Teradata.
  • - TD_OPER_CLI. Teradata PT API enables tracing for activites involving CLIv2.
  • - TD_OPER_NOTIFY. Teradata PT API enables tracing for activities involving the Notify feature.
  • - TD_OPER_OPCOMMON. Teradata PT API enables tracing for activities involving the operator common library.
  • - TD_OPER_ALL. Teradata PT API enables all driver-level tracing.
Default is TD_OFF.
You must enable the driver tracing level before you can enable the infrastructure tracing level.
Trace File Name
File name and path of the Teradata PT API trace file.
Default path is $PM_HOME. Default file name is <Name of the TPT Operator>_timestamp. For example, EXPORTER_20091221.
Query Band Expression
The query band expression to be passed to the Teradata PT API.
A query band expression is a set of name-value pairs that identify a query’s originating source. In the expression, each name-value pair is separated by a semicolon and the expression ends with a semicolon. For example, ApplicationName=Informatica;Version=9.0.1;ClientUser=A;.
Spool Mode
Determines the spool mode Teradata PT API uses to extract data from Teradata. You can select one of the following spool modes:
  • - Spool. Teradata PT API spools data while extracting data from Teradata.
  • - NoSpool. Teradata PT API does not spool data while extracting data from Teradata. If the database does not support the NoSpool option, Teradata PT API uses the Spool option.
  • - NoSpoolOnly. Teradata PT API does not spool while extracting data from Teradata.
Default is Spool.
SQL Override Query
The SQL statement to override the default query used to read data from the Teradata source. The data type, number, and order of columns in the select clause must match with the Teradata source object.

Configuring the target

Define the target object in Azure Data Warehouse in which you want to write the source data.
When you select an object, the Data Preview area shows the first ten rows of the first eight columns in the object. It also displays the total number of columns in the object. To view all of the columns, you can download a file that displays all columns and the first ten rows. If the page has more than one object, you can select the object in the Data Preview area to display its data.
The Data Preview area does not display certain Unicode characters. If the data contains binary data, the Data Preview area shows the following text:
BINARY DATA
    1. On the Target page, enter the following information:
    Field
    Description
    Connection
    Connection to Azure Data Warehouse.
    Select the connection for the target.
    Target Object
    Object into which you want to load data.
    Select the target object or click Create Target to create a target object the first time the task runs.
    Task Operation
    Select one of the following task operation types:
    • - Insert
    • - Update
    • - Upsert
    • - Delete
    Update columns
    If a database target does not include a primary key column and the task operation is update, upsert, or delete, choose the fields you want to use as temporary primary key columns.
    2. Optionally, to preview all target columns, click Download to View All Columns.
    The file shows the first ten rows of the target.
    3. Click Next.

Mapping fields

Configure field mappings to define the source data that the Azure data sync task writes to the target.
You must map at least one source field to a target field. Map columns with compatible datatypes or create field expressions to convert datatypes appropriately.
Depending on the task operation, the Azure data sync task requires certain fields to be included in the field mapping. By default, the Azure data sync task maps the required fields. If you configure the field mapping, ensure that the required fields remain mapped. If you do not map the required fields, the Azure data sync task fails.
The following table shows the required fields for each applicable task operation for a database target:
Required field
Task operations
Description
Primary Keys
Delete
Update
Upsert
Map primary key columns to enable the Azure data sync task to identify records to delete, update, or upsert in a database target.
Non-null fields
Insert
Update
Upsert
Map all fields that cannot be null in database.
    1. To match fields with similar names, click Automatch.
    Or, you can select and drag the source fields to the applicable target fields.
    2. Optionally, configure a field expression. For more information, see Field expressions.
    3. Click Next.

Saving and running the Azure data sync task

After you configure your Azure data sync task, you can save it. After you save your Azure data sync task, you can run the task immediately or you can run it later from the Explore page.
    1. To save an Azure data sync task, configure the following fields:
    Field
    Description
    Task Name
    Name of the Azure data sync task.
    Task names can contain alphanumeric characters, spaces, and the following special characters: _ . + -
    Maximum length is 100 characters. Task names are not case sensitive.
    Location
    Project folder in which the task resides. The default location is the last folder you viewed on the Explore page.
    Runtime Environment
    Runtime environment that contains the Secure Agent to run the task.
    Description
    Description of the Azure data sync task.
    Maximum length is 4000 characters.
    2. Click Save.
    3. Optionally, to run the Azure data sync task, click Run.
After you save the task, you can assign tags to the task or create a schedule for the task if desired. You can assign tags and schedules to the task on the Explore page.