To write data to a Google BigQuery target, configure a Google BigQuery object as the Target transformation in a mapping.
Specify the name and description of Google BigQuery target. Configure the target and advanced properties for the target object in mappings.
The following table describes the target properties that you can configure for a Google BigQuery target:
Property
Description
Connection
Name of the Google BigQuery V2 target connection. Select a target connection or click New Parameter to define a new parameter for the target connection.
If you want to overwrite the parameter at runtime, select the Allow parameter to be overridden at run time option when you create a parameter. When the task runs, the agent uses the parameters from the file that you specify in the task advanced session properties.
Target Type
Type of the Google BigQuery target objects available.
You can write data to a single or multiple Google BigQuery target objects. You can also parameterize the object.
Parameter
Select an existing parameter for the target object or click New Parameter to define a new parameter for the target object. The Parameter property appears only if you select Parameter as the target type.
If you want to overwrite the parameter at runtime, select the Allow parameter to be overridden at run time option when you create a parameter. When the task runs, the agent uses the parameters from the file that you specify in the task advanced session properties.
Does not apply when you perform a data driven operation.
Object
Name of the Google BigQuery target object based on the target type selected.
Create New at Runtime
Creates a target table at runtime in Google BigQuery. The target table can also contain clustered columns.
Enter a name for the target object and path for the target object and select the source fields that you want to use. By default, all source fields are used.
You must specify a valid dataset ID for the Path attribute.
The target name can contain alphanumeric characters. You cannot use special characters in the file name except the underscore character (_). You cannot parameterize the target at runtime.
For more information about how to create a target table with clustered columns, see Clustering order.
Operation
You can select one the following operations:
- Insert
- Update
- Upsert (Update or Insert)
- Delete
- Data Driven
Note: If you use complex connection mode, you cannot configure update, upsert, and delete operations.
Data Driven Condition
Flags rows for an insert, update, delete, or reject operation based on the data driven expression you specify.
You must specify the data driven condition for non-CDC sources. For CDC sources, you must leave the field empty as the rows in the CDC source tables are already marked with the operation types.
Note: Appears only when you select Data Driven as the operation type.
Update Columns
Specifies the temporary primary key columns to update, upsert or delete target data. If the Google BigQuery target does not include a primary key column, and the mapping performs an update, upsert, or delete task operation, click Add to add a temporary key.
You can select multiple columns. By default, no columns are specified.
The following table describes the advanced properties that you can configure for a Google BigQuery target:
Property
Description
UpdateMode
Determines the mode that the Secure Agent uses to update rows in the Google BigQuery target.
You can select one of the following modes:
- Update As Update. The Secure Agent updates all rows flagged for update if the entries exist.
- Update Else Insert. The Secure Agent first updates all rows flagged for update if the entries exist in the target. If the entries do not exist, the Secure Agent inserts the entries.
Default is Update as Update.
Not applicable when you perform a data driven operation.
Enable Data Driven¹
Implements data driven operation to honor flagged rows for an insert, update, delete, or reject operation based on the data driven condition.
Select this option when you select Data Driven as the target operation.
Enable Merge
Implements the Merge query to perform an update, upsert, delete or data driven operation on a Google BigQuery target table.
If you select the Enable Data Driven property, you must select this option.
Default is disabled.
Target Dataset ID
Optional. Overrides the Google BigQuery dataset name that you specified in the connection.
Target Table Name
Optional. Overrides the Google BigQuery target table name that you specified in the Target transformation.
Note: If you specify an update override query, Google BigQuery V2 Connector ignores this property.
Target Staging Dataset¹
Optional. Overrides the Google BigQuery staging dataset name that you specified in the connection and the Target Dataset ID target advanced property.
Create Disposition
Specifies whether Google BigQuery V2 Connector must create the target table if it does not exist.
You can select one of the following values:
- Create if needed. If the table does not exist, Google BigQuery V2 Connector creates the table.
- Create never. If the table does not exist, Google BigQuery V2 Connector does not create the table and displays an error message.
Create disposition is applicable only when you perform an insert operation on a Google BigQuery target.
Write Disposition
Specifies how Google BigQuery V2 Connector must write data in bulk mode if the target table already exists.
You can select one of the following values:
- Write append. If the target table exists, Google BigQuery V2 Connector appends the data to the existing data in the table.
- Write truncate. If the target table exists, Google BigQuery V2 Connector overwrites the existing data in the table.
- Write empty. If the target table exists and contains data, Google BigQuery V2 Connector displays an error and does not write the data to the target. Google BigQuery V2 Connector writes the data to the target only if the target table does not contain any data.
Write disposition is applicable for bulk mode.
Write disposition is applicable only when you perform an insert operation on a Google BigQuery target.
Write Mode
Specifies the mode to write data to the Google BigQuery target.
You can select one of the following modes:
- Bulk. Google BigQuery V2 Connector first writes the data to a staging file in Google Cloud Storage. When the staging file contains all the data, Google BigQuery V2 Connector loads the data from the staging file to the BigQuery target. Google BigQuery V2 Connector then deletes the staging file unless you configure the task to persist the staging file.
- Streaming¹. Google BigQuery V2 Connector directly writes data to the BigQuery target. Google BigQuery V2 Connector writes the data into the target row by row.
- CDC¹. Applies only when you capture changed data from a CDC source. In CDC mode, Google BigQuery V2 Connector captures changed data from any CDC source and writes the changed data to a Google BigQuery target table.
Default is Bulk mode.
Streaming mode is not applicable when you perform a data driven operation.
Streaming Template Table Suffix¹
Specify the suffix to add to the individual target tables that Google BigQuery V2 Connector creates based on the template target table.
This property applies to streaming mode.
If you select the Enable Merge option, Google BigQuery V2 Connector ignores this property.
Streaming mode is not applicable when you perform a data driven operation.
Rows per Streaming Request¹
Specifies the number of rows that Google BigQuery V2 Connector streams to the BigQuery target for each request.
Default is 500 rows.
The maximum row size that Google BigQuery V2 Connector can stream to the Google BigQuery target for each request is 10 MB.
This property applies to streaming mode.
Streaming mode is not applicable when you perform a data driven operation.
Staging File Name
Name of the staging file that Google BigQuery V2 Connector creates in the Google Cloud Storage before it loads the data to the Google BigQuery target.
This property applies to bulk mode.
Data format of the staging file
Specifies the data format of the staging file. You can select one of the following data formats:
- Avro¹
- JSON (Newline Delimited). Supports flat and record data with nested and repeated fields.
- Parquet¹
- CSV¹. Supports flat data.
Note: In a .csv file, columns of the Timestamp data type are represented as floating point numbers that cause the milliseconds value to differ.
Only JSON format is applicable for mappings in advanced mode.
This property applies to bulk and CDC mode.
Avro and parquet format is not applicable when you perform a data driven operation.
Persist Staging File After Loading
Indicates whether Google BigQuery V2 Connector must persist the staging file in the Google Cloud Storage after it writes the data to the Google BigQuery target. You can persist the staging file if you want to archive the data for future reference.
By default, Google BigQuery V2 Connector deletes the staging file in Google Cloud Storage.
This property applies to bulk mode.
Enable Staging File Compression¹
Select this option to compress the size of the staging file before Google BigQuery writes the data to the Google Cloud Storage and decompress the staging file before it loads the data to the Google BigQuery target.
You can enable staging file compression to reduce cost and transfer time.
Job Poll Interval In Seconds¹
The number of seconds after which Google BigQuery V2 Connector polls the status of the write job operation.
Default is 10.
Number of Threads for Uploading Staging File¹
The number of files that Google BigQuery V2 Connector must create to upload the staging file in bulk mode.
Local Stage File Directory¹
Specifies the directory on your local machine where Google BigQuery V2 Connector stores the files temporarily before writing the data to the staging file in Google Cloud Storage.
This property applies to bulk mode.
Note: This property is not applicable when you use a serverless runtime environment.
Allow Quoted Newlines¹
Indicates whether Google BigQuery V2 Connector must allow the quoted data sections with newline character in a .csv file.
Field Delimiter¹
Indicates whether Google BigQuery V2 Connector must allow field separators for the fields in a .csv file.
Allow Jagged Rows¹
Indicates whether Google BigQuery V2 Connector must accept the rows without trailing columns in a .csv file.
Use Default Column Values
Applicable when the selected data format for the staging file is CSV when the mapping contains unconnected ports. Includes the default column values for the unconnected port from the staging file to create the target. This is applicable when you have defined the default constraint value in the Google BigQuery source column. When you do not enable this option, the agent creates a target only with the connected ports. The agent populates null or empty strings for unconnected ports.
Update Override¹
Optional. Overrides the update SQL statement that the Secure Agent generates to update the Google BigQuery target.
To use the update override query, you must set the Operation property to Update and the UpdateMode property to Update As Update.
Use the following format to define an update override query:
UPDATE `<project_name>.<dataset_name>.<table_name>` as <alias_name>
SET <alias_name>.<col_name1>=:<temp_table>.<col_name1>, <alias_name>.<col_name2>=:<temp_table>.<col_name2> FROM <dataset_name>.:<temp_table> WHERE <conditional expression>
For example,
UPDATE `project1.custdataset.cust_table1` as ab
SET ab.fld_str=:custtemp.fld_str, ab.fld_int=:custtemp.fld_int FROM custdataset.:custtemp WHERE ab.fld_string_req = :custtemp.fld_string_req
Not applicable when you perform a data driven operation.
pre SQL¹
SQL statement that you want to run before writing data to the target.
For example, if you want to select records from the database before you write the records into the table, specify the following pre-SQL statement:
SELECT * FROM 'api-project-80697026669.EMPLOYEE.RegionNation' LIMIT 1000
post SQL¹
SQL statement that you want to run after writing the data into the target.
For example, if you want to update records in a table after you write the records into the target table, specify the following post-SQL statement:
SET phoneNumber.number =1000011, phoneNumber.areaCode=100 where fullname='John Doe'
Suppress post SQL on Error¹
Indicates whether the Secure Agent must abort the post-SQL query execution in case the task fails to write data to the Google BigQuery target table due to errors.
Default is disabled.
pre SQL Configuration¹
Specify a pre-SQL configuration.
For example,
DestinationTable:PRESQL_TGT2,
DestinationDataset:EMPLOYEE,
FlattenResults:False,
WriteDisposition:WRITE_TRUNCATE,
UseLegacySql:False
post SQL Configuration¹
Specify a post-SQL configuration.
For example,
DestinationTable:POSTSQL_SRC,
DestinationDataset:EMPLOYEE,
FlattenResults:True,
UseLegacySQL:False
Quote Char¹
Specifies the quote character to skip when you write data to Google BigQuery. When you write data to Google BigQuery and the source table contains the specified quote character, the task fails. Change the quote character value to a value that does not exist in the source table.
Default is double quotes.
Truncate target table
Truncates the Google BigQuery target table before loading data to the target.
Default is disabled.
Allow Duplicate Inserts
Indicates that the Secure Agent can insert duplicate rows into the Google BigQuery target.
Applicable only when you perform a data driven operation and DD_INSERT is specified in the data driven condition.
Default is not selected.
Disable Duplicate Update Rows¹
Determines if multiple incoming rows attempt to update the same target row, the Secure Agent must process only one of the incoming rows and ignore the rest of the incoming rows.
Select this option to configure the mapping to process only one of the incoming rows and ignore the rest of the incoming rows.
Default is disabled.
Label¹
You can assign a label for the transformation to organize and filter the associated jobs in the Google Cloud Platform Log Explorer.
The project ID for the Google Cloud project that is linked to an active Google Cloud Billing account where the Secure Agent runs query and load jobs.
If you omit the project ID here, the Secure Agent runs query and load jobs in the Google Cloud project corresponding to the Project ID value specified in the Google BigQuery V2 connection.
Forward Rejected Rows
Applicable only when you configure DD_REJECT constant in the data driven condition to reject all the rows.
Otherwise, this property is not applicable for Google BigQuery V2 Connector.
¹Doesn't apply to mappings in advanced mode.
Write modes
When you use Google BigQuery V2 Connector, you can write data by using bulk mode or streaming mode. Before you choose a mode, see the Google documentation to understand the cost implications and trade-offs for each mode.
You can write data to a Google BigQuery target by using one of the following modes:
Bulk mode
Use bulk mode when you want to write large volumes of data in a cost-efficient manner.
In bulk mode, Google BigQuery V2 Connector first writes the data to a staging file in Google Cloud Storage. When the staging file contains all the data, Google BigQuery V2 Connector loads the data from the staging file to the Google BigQuery target.
When you enable staging file compression, Google BigQuery V2 Connector compresses the size of the staging file before it writes data to Google Cloud Storage. Google BigQuery V2 Connector writes the compressed file to Google Cloud Storage and then submits a load job to the Google BigQuery target.
Note: Enabling compression reduces the time that Google BigQuery V2 Connector takes to write data to Google Cloud Storage. However, there will be a performance degradation when Google BigQuery V2 Connector writes data from Google Cloud Storage to the Google BigQuery target.
After writing the data into the Google BigQuery target, Google BigQuery V2 Connector deletes the staging file unless you configure the task or mapping to persist the staging file. You can choose to persist the staging file if you want to archive the data for future reference.
If a job fails, Google BigQuery V2 Connector deletes the staging file unless you configure the task or mapping to persist the staging file.
Streaming mode
Use streaming mode when you want the Google BigQuery target data to be immediately available for querying and real-time analysis. Evaluate Google's streaming quota policies and billing policies before you use streaming mode.
In streaming mode, Google BigQuery V2 Connector directly writes data to the Google BigQuery target. Google BigQuery V2 Connector appends the data into the Google BigQuery target.
You can configure the number of rows that you want Google BigQuery V2 Connector to stream in one request. If you want to stream a larger number of rows than the maximum permissible limit prescribed by Google, you can write the data to multiple smaller target tables instead of one large target table. You can create a template table based on which Google BigQuery must create multiple tables. You can define a unique suffix for each table. Google BigQuery creates each table based on the template table and adds the suffix to uniquely identify each table.
CDC mode
Use CDC mode only when you capture changed data from a CDC source. In CDC mode, you can configure Google BigQuery V2 Connector to capture changed data from any CDC source and write the changed data to a Google BigQuery target table.
Optimize write performance in staging mode
You can configure Data Integration to create a flat file for staging when you write data to a Google BigQuery target in bulk mode. You can set Data Integration to optimize the staging performance.
Data Integration first writes the data to a flat file located in the local staging file directory. When the staging file contains all the data, Data Integration loads the data from the staging file to the Google BigQuery target.
In the Google BigQuery advanced target properties, set the Local Stage File Directory property to a directory on your local machine where you want to create the flat file and set the Data Format of the staging file property to CSV.
When you run the mapping, the flat file is created in the local stage file directory that you specified.
Note: If you do not specify a local stage file directory, the flat file is created in the temp directory in the Linux or Windows machine where the Secure Agent runs.
When the mapping run is completed, the Secure Agent deletes the local staging file.
Enabling Google BigQuery V2 Connector to optimize the staging performance
Perform the following tasks to set the staging property, INFA_DTM_STAGING_ENABLED_CONNECTORS, for the Tomcat in the Secure Agent properties:
1In Administrator, click Runtime Environments.
The Runtime Environments page appears.
2Select the Secure Agent for which you want to set the custom configuration property.
3Click Edit Secure Agent icon corresponding to the Secure Agent you want to edit in Actions .The Edit Secure Agent page appears.
4In the System Configuration Details section, select the Service as Data Integration Server and the type as Tomcat.
5Set the value of the Tomcat property INFA_DTM_STAGING_ENABLED_CONNECTORS to the plugin ID of the Google BigQuery V2 Connector.
You can find the plugin ID in the manifest file located in the following directory:
The following image shows the INFA_DTM_STAGING_ENABLED_CONNECTORS property set for the Secure Agent:
6Click Save.
7Restart the Secure Agent.
You can check the session logs. If the flat file is created successfully, Data Integration logs the following message in the session log: INFA_DTM_STAGING mode is enabled for the write operation.
If you do not set the staging property, Data Integration performs staging without the optimized settings, which might impact the performance of the task.
Rules and guidelines when you optimize the write performance
Consider the following rules when you enable the staging property:
•If you run a mapping enabled for SQL ELT optimization, the mapping runs without SQL ELT optimization.
•When the mapping writes a column of the String data type that contains null values to a column of the String data type set to Required constraint in the Google BigQuery target table, the job fails and does not write the data to any of the target columns.
•When you write Numeric data to the Google BigQuery target, the Numeric data in the local staging flat file contains trailing zeroes. However, the Secure Agent writes the Numeric data correctly in the Google BigQuery target table.
•When you write data of the Binary data type to the Google BigQuery target, ensure that size or precision of the Binary data does not exceed more than 78643200 bytes.
•When you write data with a precision of more than 15 digits in the float data type, the data becomes corrupted.
Mapping tasks with CDC sources
You can use Google BigQuery V2 Connector to capture changed data from any CDC source and write the changed data to a Google BigQuery target. Add the CDC sources in mappings, and then run the associated mapping tasks to write the changed data to the target. When you capture changed data from a CDC source, you can only configure a single Google BigQuery V2 target transformation in a mapping. You can configure multiple Google BigQuery V2 targets to write changed data from a CDC source. You can configure multiple pipelines in a mapping to write changed data from multiple CDC sources to multiple Google BigQuery V2 targets.
When the mapping task processes the changed data from a CDC source such as Oracle Express CDC V2, Google BigQuery V2 Connector creates a state table and a staging table in Google BigQuery. When the changed data is received from the CDC source, Google BigQuery V2 Connector uploads the changed data to the staging table. Then, it generates a Job_Id and writes the Job_Id to the state table along with the restart information. Google BigQuery V2 Connector then merges the stage table with the actual target table in Google BigQuery.
Each time you run the mapping task, Google BigQuery V2 Connector creates the state table, if it does not exist, to store the state information. Google BigQuery V2 Connector uses the following naming convention for the state table name:
Your organization needs to replicate real-time changed data from a mission-critical production system to minimize intrusive, non-critical work, such as offline reporting or analytical operations system. You can use Google BigQuery V2 Connector to capture changed data from any CDC source and write the changed data to a Google BigQuery target. Add the CDC sources in mappings, and then run the associated mapping tasks to write the changed data to the target.
1In Data Integration, click New > Mapping > Create.
The New Mapping dialog box appears.
2 Enter a name and description for the mapping.
3On the Source transformation, specify a name and description in the general properties.
4On the Source tab, select any configured CDC connection and specify the required source properties.
5On the Target transformation, specify a name and description in the general properties.
6On the Target tab, perform the following steps to configure the target properties:
aIn the Connection field, select the Google BigQuery V2 connection.
bIn the Target Type field, select the type of the target object.
cIn the Object field, select the required target object.
dIn the Operation field, select Data Driven to properly handle insert, update, and delete records from the source.
eIn the Data Driven Condition field, leave the field empty.
fIn the Update Column field, select the key columns to upsert or update data to or delete data from Google BigQuery.
gIn the Advanced Properties section, you must select CDC in the Write Mode property.
hYou can only configure the following advanced target properties for CDC mode:
▪ Target Dataset ID
▪ Target Table Name
▪ Job Poll Interval In Seconds
▪ pre SQL
▪ post SQL
▪ pre SQL Configuration
▪ post SQL Configuration
7On the Field Mapping tab, map the incoming fields to the target fields. You can manually map an incoming field to a target field or automatically map fields based on the field names.
8In the Actions menu, click New Mapping Task.
The New Mapping Task page appears.
9In the Definition tab, enter the task name and select the configured mapping.
10In the CDC Runtime tab, specify the required properties for the selected CDC source.
For more information about the CDC Runtime properties, see the source properties for the selected CDC source.
11On the Runtime Options tab, add the following properties in the Advanced Session Properties section:
aSelect Commit on End of File from the menu, and keep the property disabled.
bSelect Recovery Strategy and set Resume from last checkpoint as the value of the property.
12Click Save > Run the mapping task.
Alternatively, you can create a schedule that runs the mapping task on a recurring basis without manual intervention. You can define the schedule to minimize the time between mapping task runs.
In Monitor, you can monitor the status of the logs after you run the task.
Rules and guidelines for Google BigQuery V2 CDC target
Consider the following guidelines when working with a Google BigQuery V2 change data capture (CDC) target:
•Informatica recommends that the Secure Agent, the CDC source, and PowerExchange for CDC are configured in the same region as Google BigQuery.
•To increase performance and avoid run-time environment memory issues, increase the Java heap size in the JVM option for type DTM. Set JVMOption1 to -Xmx1024m in the System Configuration Details section of the Secure Agent and restart the Secure Agent.
•To improve performance, specify a higher commit interval for the Maximum Rows Per Commit property on the CDC Runtime page in the mapping task wizard. However, in case of failure, recovery takes more time for a higher commit interval.
•It is recommended to use update queries on the CDC source database only if the Google BigQuery target table is partitioned and clustered.
•You must define a column as required in the Google BigQuery target table.
•If you define a column as required in the Google BigQuery target table, you must map a column in the CDC source to the required column in the Google BigQuery target in the mapping.
•When you use a Google BigQuery V2 connection in complex mode, you cannot write changed data from a CDC source to a Google BigQuery V2 target.
•When you capture changed data from a CDC source and the Google BigQuery V2 target contains a repeated column of the Record data type, the mapping fails.
Upsert task operation
When you perform an upsert operation on a Google BigQuery target, you must configure the upsert fields for the target table. You can use an ID field for standard objects. Ensure that you include the upsert field in the field mappings for the task.
Rules and Guidelines
Consider the following rules and guidelines when you perform an upsert operation on a Google BigQuery target without using Merge query:
•You cannot use the streaming mode to write data to a Google BigQuery target.
•When you configure a Google BigQuery V2 connection to use simple or hybrid connection mode, you cannot configure upsert operations for columns of the Record data type and repeated columns.
•When you perform an upsert operation on a Google BigQuery target and if multiple incoming rows attempt to update the same target row, ensure that you select the Disable Duplicate Update Rows and the Enable Merge target advanced property.
Data driven operation for mappings
When you flag rows for an insert, update, delete, or reject operation based on the data driven condition for a Google BigQuery target in a mapping, you must select the Enable Data Driven and Enable Merge properties.
Rules and Guidelines
Consider the following rules and guidelines when you perform a data driven operation on a Google BigQuery target:
•You cannot parameterize the target connection or object.
•You cannot use the Google BigQuery V2 connection in complex mode.
•When you configure a data driven condition, ensure to map the fields involved in the condition within the field mapping.
•Ensure that you do not specify a column of Record data type, repeated columns, or Record data type with repeated columns in the data driven condition.
•When you use a Google BigQuery connection in simple mode, ensure that you do not specify a column of Byte, Record data type, or repeated columns in the data driven condition.
•When you use a Google BigQuery connection in hybrid mode, ensure that you do not specify a column of Byte, Record data type with repeated columns, or Primitive data type with repeated columns in the data driven condition.
•When you define the DD_UPDATE constant in the data driven condition, ensure that there are no null values in the update column.
•When you use the DD_REJECT constant in the data driven condition to reject all the rows, ensure that you have selected the Persist Staging File After Loading advanced target property.
•Ensure that the target table is not an external table.
•When you use the Google BigQuery V2 connection in simple connection mode and specify an update column of record data type with nested fields, you must ensure that you select JSON as the staging file format.
•You need to specify a condition for a data driven operation. If you keep the condition field empty, the mapping fails.
•You cannot use Disable Duplicate Update Rows target advanced property to perform a data driven operation.
Using Merge query for update, upsert, and delete operations
You can implement the Merge query to perform the following operations on a Google BigQuery target:
•Update
•Upsert
•Delete
To implement Merge query, select the Enable Merge option in the advanced target properties.
Rules and Guidelines
Consider the following rules and guidelines when you use Merge query:
•When you configure a Google BigQuery V2 connection to use simple and select CSV as the staging file format, the Google BigQuery target table must not contain columns of record data type.
•When you configure a Google BigQuery V2 connection to use simple, the Google BigQuery target table must not contain repeated columns.
•When you configure a Google BigQuery V2 connection to use hybrid connection mode, the Google BigQuery target table must not contain repeated column as a key field.
Determine the order of processing for multiple targets
You can configure a mapping to write to multiple targets in a single pipeline, with each target configured for any write operation. The order of the target operation is not deterministic.
However, if you want to process the target operations to process in a specific order such as delete, update, and insert, you need to set certain properties in the Secure Agent and in the task properties.
Set -DEnableSingleCommit=true in the Secure Agent properties
Perform the following tasks to set the property for the Secure Agent:
1Open Administrator and select Runtime Environments.
2Select the Secure Agent for which you want to set the property.
3On the upper-right corner of the page, click Edit.
4In the System Configuration Details section, select the Type as DTM for the Data Integration Service.
5Edit the JVM options and set the property to -DEnableSingleCommit=true.
Set the EnableSingleCommit property in the task properties
Perform the following tasks to set the property in the task:
1On the Runtime Options page in the mapping task properties, navigate to the Advanced Session Properties section.
2From the Session Property Name list, select Custom Properties, and set the Session Property Value to Yes.
Clustering order
You can create a target table with clustered columns at runtime. Clustering organizes the data based on the specified columns into optimally-sized storage blocks, which enhances the query performance.
The clustering order determines the sort order of the data within the table. To define the clustering order, enter up to four fields, each separated by a comma.
You can use the following data types in the clustering columns:
•Bignumeric
•Bool
•Date
•Datetime
•INT64
•Numeric
•String
•Timestamp
Rules and guidelines
Consider the following rules and guidelines when you configure clustered tables in the Google BigQuery target:
•When you set the clustering order, the mapping can fail for the following issues:
- The number of clustering fields exceeds the defined limit.
- An incorrect name for the clustering field.
- The clustering field is of the Byte or Float data type.
The related error message for the mapping failure is not logged in the session log. You can find the related error message logged in the Tomcat log.
•When you use Simple connection mode in a mapping, you need to provide the clustering field name for the Record data type in the Clustering Order property using an underscore instead of a dot. For example, provide the field name as _(Master_String) instead of .(Master.String). If you use a dot in the clustering field name, the mapping fails.
Stop on errors
You can configure the maximum number of load job errors at the target that a mapping task can encounter before it fails.
The error threshold can include malformed rows and data conversion errors in the load job, but the errors resulting from update, delete, or merge queries are excluded.
You can set the error threshold value in the Stop on errors session property in a mapping task. You can set the value from 0 to 2147483647. Values less than 0, alphabets, or special characters are not allowed.
To configure the stop on error functionality, ensure that the mapping task is configured with the following conditions:
•The Enable BigQuery Storage API property is not selected in the connection.
•Bulk is selected as the write mode.
•CSV or JSON is selected as the data format.
The mapping task maintains an independent error count for each transformation and table partition. When the error threshold exceeds in any of the transformations in a mapping, the mapping fails.
The following examples describe the mapping behavior based on the error threshold you set:
•If you don't set a value or set the value to 0, the mapping fails at the first error.
•If you set the value to 3, the mapping runs successfully for the first two errors and then fails when the third error occurs.
If you want to parameterize this session property, set the Stop on Errors session property to $PMSessionErrorThreshold and provide its threshold value in the DTM custom configurations of the Secure Agent. This threshold value applies to all mapping tasks and you cannot override it with a different value for different mappings.
Rules and guidelines
Consider the following rules and guidelines when you configure the stop on error functionality in a mapping task:
•You cannot configure the stop on error functionality in the following mapping scenarios:
- Mapping in advanced mode.
- Mapping enabled with staging optimization.
- Mapping enabled with SQL ELT optimization.
•You cannot access the Error Rows File for the mapping task from the Results section.