Configuring Azure data sync tasks
Configure an Azure data sync task to read and synchronize data between a source and Microsoft Azure SQL Data Warehouse.
To configure an Azure data sync task, click New on the navigation bar, and then complete the following steps:
- 1. Configure the source.
- 2. Configure the target.
- 3. Configure field mappings.
- 4. Save and run the Azure data sync task.
As you configure the task, you can click Save to save your work at any time. When you finish configuring the Azure data sync task, you can save and run the task or you can run it later from the Explore page. If you want to edit a task or complete an unfinished task, you can access the task on the Explore page.
Configuring the source
Configure the source object that you want to write to Azure Data Warehouse. If you don't want to write all of the data, you can configure data filters to specify which fields to include or exclude.
When you select an object, the Data Preview area shows the first ten rows of the first eight columns in the object. It also displays the total number of columns in the object. To view all of the columns, you can download a file that displays all columns and the first ten rows. If the page has more than one object, you can select the object in the Data Preview area to display its data.
The Data Preview area does not display certain Unicode characters. If the data contains binary data, the Data Preview area shows the following text:
BINARY DATA
1. On the Source page, select a connection.
2. Select the source object.
3. To specify advanced properties, click Advanced.
Note that not all connection types have advanced connection properties.
4. Optionally, create a data filter to reduce the number of source rows that the Azure data sync task reads.
To create a row in the data filter table, click the Add icon.
5. Optionally, to view all of the source columns, click Download to View All Columns.
The file shows the first ten rows of the source.
6. Click Next.
Microsoft Azure advanced source connection properties
When you select a Microsoft Azure data warehouse connection as the source, you can configure advanced Microsoft Azure connection properties.
The following table describes advanced Microsoft Azure connection properties:
Property | Description |
---|
Azure Blob Container Name | Name of the container in Microsoft Azure Storage to use for staging before extracting data from the Microsoft Azure data warehouse. |
Field Delimiter | Character used to separate fields in the file. Default is 0x1e. You can select the following field delimiters from the list: ~ ` | . TAB 0x1e |
Number of concurrent connections to Blob Store | Number of concurrent connections to extract data to the staging area in Microsoft Azure Blob Storage. Default is 4. |
Pre SQL | Pre-SQL command that must be run before reading data from the source. |
Post SQL | Post-SQL command that must be run after writing data to the target. |
SQL Override | When you read data from a Microsoft Azure data warehouse object, you can configure SQL overrides and define constraints |
On Pre-Post SQL Error | Determines the behavior when a task that includes pre-SQL or post-SQL commands encounters errors. You can select any of the following options: - - Continue. The task continues regardless of errors.
- - Stop. The task stops when errors occur while executing pre-SQL or post-SQL commands.
|
Quote Character | The task skips the specified character when it reads data from the Microsoft Azure data warehouse. |
Netezza advanced source connection properties
When you select a Netezza connection for the source, you can configure advanced Netezza connection properties.
The following table describes advanced Netezza connection properties:
Property | Description |
---|
Pipe Directory Path | Path for the task to create the pipe for the external table. |
Delimiter | Delimiter that separates successive input fields. You can enter any value supported by the Netezza Performance Server. The value can be a part of the data for the Netezza source. Default is |. |
Null Value | NullValue parameter of an external table. The task uses the NullValue internally. Maximum value is one character. Default is blank. |
Escape Character | Escape character of an external table. If the data contains NULL, CR, and LF characters in the Char or Varchar field, add an escape character in the source data before you extract. Enter an escape character before the data. The supported escape character is backslash (\). |
Socket Buffer Size | The socket buffer size that you must set to increase performance. Set from 25 to 50% of the DTM buffer size. You might need to test different settings for optimal performance. Enter a value between 4096 and 2147483648 bytes. Default is 8388608 bytes. |
SQL Query Override | Overrides the default query. Enter the SQL query that the task must use to query data from the Netezza source. |
Owner Name | The Netezza schema name. |
Source Table Name | Overrides the source table name. For example, you can parameterize the source table name to override the table name in the task. |
Pre SQL | Pre-SQL command that must be run before reading data from the source. |
Post SQL | Post-SQL command that must be run after writing data to the target. |
Teradata advanced source connection properties
When you select a Teradata connection as the source, you can configure advanced Teradata connection properties.
The following table describes advanced Teradata connection properties:
Property | Description |
---|
Driver Tracing Level | Determines Teradata PT API tracing at the driver level: - - TD_OFF. Teradata PT API disables tracing.
- - TD_OPER. Teradata PT API enables tracing for driver-specific activities for Teradata.
- - TD_OPER_CLI. Teradata PT API enables tracing for activites involving CLIv2.
- - TD_OPER_NOTIFY. Teradata PT API enables tracing for activities involving the Notify feature.
- - TD_OPER_OPCOMMON. Teradata PT API enables tracing for activities involving the operator common library.
- - TD_OPER_ALL. Teradata PT API enables all driver-level tracing.
Default is TD_OFF. |
Infrastructure Tracing Level | Determines Teradata PT API tracing at the infrastructure level: - - TD_OFF. Teradata PT API disables tracing.
- - TD_OPER. Teradata PT API enables tracing for driver-specific activities for Teradata.
- - TD_OPER_CLI. Teradata PT API enables tracing for activites involving CLIv2.
- - TD_OPER_NOTIFY. Teradata PT API enables tracing for activities involving the Notify feature.
- - TD_OPER_OPCOMMON. Teradata PT API enables tracing for activities involving the operator common library.
- - TD_OPER_ALL. Teradata PT API enables all driver-level tracing.
Default is TD_OFF. You must enable the driver tracing level before you can enable the infrastructure tracing level. |
Trace File Name | File name and path of the Teradata PT API trace file. Default path is $PM_HOME. Default file name is <Name of the TPT Operator>_timestamp. For example, EXPORTER_20091221. |
Query Band Expression | The query band expression to be passed to the Teradata PT API. A query band expression is a set of name-value pairs that identify a query’s originating source. In the expression, each name-value pair is separated by a semicolon and the expression ends with a semicolon. For example, ApplicationName=Informatica;Version=9.0.1;ClientUser=A;. |
Spool Mode | Determines the spool mode Teradata PT API uses to extract data from Teradata. You can select one of the following spool modes: - - Spool. Teradata PT API spools data while extracting data from Teradata.
- - NoSpool. Teradata PT API does not spool data while extracting data from Teradata. If the database does not support the NoSpool option, Teradata PT API uses the Spool option.
- - NoSpoolOnly. Teradata PT API does not spool while extracting data from Teradata.
Default is Spool. |
SQL Override Query | The SQL statement to override the default query used to read data from the Teradata source. The data type, number, and order of columns in the select clause must match with the Teradata source object. |
Configuring the target
Define the target object in Azure Data Warehouse in which you want to write the source data.
When you select an object, the Data Preview area shows the first ten rows of the first eight columns in the object. It also displays the total number of columns in the object. To view all of the columns, you can download a file that displays all columns and the first ten rows. If the page has more than one object, you can select the object in the Data Preview area to display its data.
The Data Preview area does not display certain Unicode characters. If the data contains binary data, the Data Preview area shows the following text:
BINARY DATA
1. On the Target page, enter the following information:
Field | Description |
---|
Connection | Connection to Azure Data Warehouse. Select the connection for the target. |
Target Object | Object into which you want to load data. Select the target object or click Create Target to create a target object the first time the task runs. |
Task Operation | Select one of the following task operation types: - - Insert
- - Update
- - Upsert
- - Delete
|
Update columns | If a database target does not include a primary key column and the task operation is update, upsert, or delete, choose the fields you want to use as temporary primary key columns. |
2. Optionally, to preview all target columns, click Download to View All Columns.
The file shows the first ten rows of the target.
3. Click Next.
Mapping fields
Configure field mappings to define the source data that the Azure data sync task writes to the target.
You must map at least one source field to a target field. Map columns with compatible datatypes or create field expressions to convert datatypes appropriately.
Depending on the task operation, the Azure data sync task requires certain fields to be included in the field mapping. By default, the Azure data sync task maps the required fields. If you configure the field mapping, ensure that the required fields remain mapped. If you do not map the required fields, the Azure data sync task fails.
The following table shows the required fields for each applicable task operation for a database target:
Required field | Task operations | Description |
---|
Primary Keys | Delete Update Upsert | Map primary key columns to enable the Azure data sync task to identify records to delete, update, or upsert in a database target. |
Non-null fields | Insert Update Upsert | Map all fields that cannot be null in database. |
1. To match fields with similar names, click Automatch.
Or, you can select and drag the source fields to the applicable target fields.
2. Optionally, configure a field expression. For more information, see Field expressions. 3. Click Next.
Saving and running the Azure data sync task
After you configure your Azure data sync task, you can save it. After you save your Azure data sync task, you can run the task immediately or you can run it later from the Explore page.
1. To save an Azure data sync task, configure the following fields:
Field | Description |
---|
Task Name | Name of the Azure data sync task. Task names can contain alphanumeric characters, spaces, and the following special characters: _ . + - Maximum length is 100 characters. Task names are not case sensitive. |
Location | Project folder in which the task resides. The default location is the last folder you viewed on the Explore page. |
Runtime Environment | Runtime environment that contains the Secure Agent to run the task. |
Description | Description of the Azure data sync task. Maximum length is 4000 characters. |
2. Click Save.
3. Optionally, to run the Azure data sync task, click Run.
After you save the task, you can assign tags to the task or create a schedule for the task if desired. You can assign tags and schedules to the task on the Explore page.