
Property | Description |
|---|---|
Configuration Name | Name of the configuration. |
Description | (Optional) Description of the configuration. |
Domain | The domain where you want to use this configuration. The list of repositories within the selected domain appears in the Repository field. |
Repository Type | The type of repository where you want to use this configuration. |
Repository | The repository where you want to use this configuration. Important: The selected repository must be assessed to create the configuration. |
Connection property | Description |
|---|---|
Schema | The Amazon Redshift schema name to be used when creating the object. Th schema name is similar to the schema name specified in the SCHEMAMAP.properties file. Default is schema_changeit. |
Property name | Description |
|---|---|
Schema Name | Overrides the default schema name. |
S3 Bucket Name | Amazon S3 bucket name for staging the data. You can also specify the bucket name with the folder path. If you provide an Amazon S3 bucket name that is in a different region than the Amazon Redshift cluster, you must configure the REGION attribute in the Unload command options. |
Read Mode | Specifies the read mode to read data from the Amazon Redshift source. You can specify one of the following read modes:
Default is Staging. |
Fetch Size | Determines the number of rows to read in one resultant set from Amazon Redshift. Applies only when you select the Direct read mode. Default is 10000. |
Enable Compression | Compresses the staging files into the Amazon S3 staging directory. The task performance improves when the Secure Agent compresses the staging files. Default is enabled. |
Staging Directory Location | Location of the local staging directory. When you run a task in the Secure Agent runtime environment, specify a directory path that is available on the corresponding Secure Agent machine in the runtime environment. For example, C:\Temp. Ensure that you have the write permissions to the directory. |
Unload Options | Unload command options. Enter the options to the Unload command to extract data from Amazon Redshift and create staging files on Amazon S3. You can enter the following unload options:
For example: DELIMITER = \036;ESCAPE = OFF;NULL=text;PARALLEL = ON;AWS_IAM_ROLE=arn;aws;iam;;<account ID>;role/<role-name>;REGION = ap-south-1 For the AWS_IAM_ROLE option, provide the Amazon Resource Name (ARN) of an Amazon Redshift Role. Also, specify a directory on the machine that hosts the Secure Agent. |
Temporary Credential Duration | The time duration during which an IAM user can use the dynamically generated temporarily credentials to access the AWS resource. Enter the time duration in seconds. Default is 900 seconds. If you require more than 900 seconds, you can set the time duration up to a maximum of 12 hours in the AWS console and then enter the same time duration in this property. |
Treat NULL Value as NULL | Retains the null values when you read data from Amazon Redshift. |
Encryption Type | Encrypts the data in the Amazon S3 staging directory. You can specify the following encryption types:
Default is None. |
Download S3 Files In Multiple Parts | Downloads large Amazon S3 objects in multiple parts. When the file size of an Amazon S3 object is greater than 8 MB, you can choose to download the object in multiple parts in parallel. Default is 5 MB. |
Multipart Download Threshold Size | The maximum threshold size to download an Amazon S3 object in multiple parts. Default is 5 MB. |
Property | Description |
|---|---|
S3 Bucket Name | Amazon S3 bucket name for writing the files to Amazon Redshift target. You can also specify the bucket name with the folder path. If you provide an Amazon S3 bucket name that is in a different region than the Amazon Redshift cluster, you must configure the REGION attribute in the Copy Options property. |
Enable Compression | Compresses the staging files before writing the files to Amazon Redshift. The task performance improves when the Secure Agent compresses the staged files. Default is enabled. |
Staging Directory Location | Location of the local staging directory. When you run a task in Secure Agent runtime environment specify a directory path that is available on the corresponding Secure Agent machine in the runtime environment. For example, C:\Temp. Ensure that you have the write permissions on the directory. |
Batch Size | Minimum number of rows in a batch. Enter a number greater than 0. Default is 2000000. |
Max Errors per Upload Batch for INSERT | Number of error rows that causes an upload insert batch to fail. Enter a positive integer. Default is 1. If the number of errors is equal to or greater than the property value, the Secure Agent writes the entire batch to the error file. |
Require Null Value For Char and Varchar | Replaces the string value with NULL when you write data to Amazon Redshift columns of Char and Varchar data types. Default is an empty string. Note: When you run a mapping to write null values to a table that contains a single column of the Int, Bigint, numeric, real, or double data type, the mapping fails. You must provide a value other than the default value in the Require Null Value For Char And Varchar property. |
WaitTime In Seconds For S3 File Consistency | Number of seconds to wait for the Secure Agent to make the staged files consistent with the list of files available on Amazon S3. Default is 0. |
Copy Options | Copy command options. Enter the options to the Copy command to write data from Amazon S3 to the Amazon Redshift target when the default delimiter comma (,) or double-quote (") is used in the data. Provide the Amazon Redshift Role Amazon Resource Name (ARN). You can add the following options:
For example: DELIMITER = \036;ACCEPTINVCHARS = #;QUOTE = \037 COMPUPDATE = ON;AWS_IAM_ROLE=arn:aws:iam::<account ID>:role/<role-name>;REGION = ap-south-1 Specify a directory on the machine that hosts the Secure Agent. Note: If you do not add the options to the Copy command manually, the Secure Agent uses the default values. |
S3 Server Side Encryption | Indicates that Amazon S3 encrypts data during upload. Provide a customer master key ID in the connection property to enable this property. Default is not selected. |
S3 Client Side Encryption | Indicates that the Secure Agent encrypts data using a private key. Provide a master symmetric key ID in the connection property to enable this property. If you enable both server-side and client-side encryptions, the Secure Agent ignores the server-side encryption. |
Analyze Target Table | Runs an ANALYZE command on the target table. The query planner on Amazon Redshift updates the statistical metadata to build and choose optimal plans to improve the efficiency of queries. |
Vacuum Target Table | Recovers disk space and sorts the row in a specified table or all tables in the database. You can select the following recovery options:
Default is None. |
Prefix to retain staging files on S3 | Retains staging files on Amazon S3. Provide both a directory prefix and a file prefix separated by a slash (/) or only a file prefix to retain staging files on Amazon S3. For example, backup_dir/backup_file or backup_file. |
Success File Directory | Directory for the Amazon Redshift success file. Specify a directory on the machine that hosts the Secure Agent. |
Error File Directory | Directory for the Amazon Redshift error file. Specify a directory on the machine that hosts the Secure Agent. |
Transfer Manager Thread Pool Size | Number of threads to write data in parallel. Default is 10. |
Preserve record order on write | Retains the order of the records when you read data from a CDC source and write data to an Amazon Redshift target. Use this property when you create a mapping to capture the changed record from a CDC source. This property enables you to avoid inconsistencies between the CDC source and target. |
Minimum Upload Part Size | Minimum size of the Amazon Redshift object to upload an object. Default is 5 MB. |
Number of files per batch | Calculates the number of the staging files per batch. If you do not provide the number of files, Amazon Redshift V2 Connector calculates the number of the staging files. When you connect to Amazon Redshift serverless and do not provide the number of files, 20 files per batch is considered as the default value. |
Schema Name | Overrides the default schema name. |
Recovery Schema Name | Schema that contains recovery information stored in the infa_recovery_table table on the target system to resume the extraction of the changed data from the last checkpoint. |
Temporary Credential Duration | The time duration during which an IAM user can use the dynamically generated temporarily credentials to access the AWS resource. Enter the time duration in seconds. Default is 900 seconds. If you require more than 900 seconds, you can set the time duration maximum up to 12 hours in the AWS console and then enter the same time duration in this property. |
Property | Description |
|---|---|
Read Mode | Specifies the read mode to read data from the Amazon Redshift source. You can specify one of the following read modes:
Default is Staging. |
Fetch Size | Determines the number of rows to read in one resultant set from Amazon Redshift. Applies only when you select the Direct read mode. Default is 10000. |
S3 Bucket Name | Amazon S3 bucket name for staging the data. You can also specify the bucket name with the folder path. If you provide an Amazon S3 bucket name that is in a different region than the Amazon Redshift cluster, you must configure the REGION attribute in the Unload command options. |
Enable Compression | Compresses the staging files into the Amazon S3 staging directory. The task performance improves when the Secure Agent compresses the staging files. Default is selected. |
Staging Directory Location | Location of the local staging directory. When you run a task in Secure Agent runtime environment, specify a directory path that is available on the corresponding Secure Agent machine in the runtime environment. Specify the directory path in the following manner: <staging directory> For example, C:\Temp. Ensure that you have the write permissions on the directory. |
Unload Options | Unload command options. Add options to the Unload command to extract data from Amazon Redshift and create staging files on Amazon S3. Provide an Amazon Redshift Role Amazon Resource Name (ARN). You can add the following options:
For example: DELIMITER = \036;ESCAPE = OFF;NULL=text;PARALLEL = ON;AWS_IAM_ROLE=arn;aws;iam;;<account ID>;role/<role-name>;REGION = ap-south-1 Specify a directory on the machine that hosts the Secure Agent. Note: If you do not add the options to the Unload command manually, the Secure Agent uses the default values. |
Treat NULL Value as NULL | Retains the null values when you read data from Amazon Redshift. |
Encryption Type | Encrypts the data in the Amazon S3 staging directory. You can select the following encryption types:
Default is None. |
Download S3 Files in Multiple Parts | Downloads large Amazon S3 objects in multiple parts. When the file size of an Amazon S3 object is greater than 8 MB, you can choose to download the object in multiple parts in parallel. Default is 5 MB. |
Multipart Download Threshold Size | The maximum threshold size to download an Amazon S3 object in multiple parts. Default is 5 MB. |
Schema Name | Overrides the default schema name. Note: You cannot configure a custom query when you use the schema name. |
Temporary Credential Duration | The time duration during which an IAM user can use the dynamically generated temporarily credentials to access the AWS resource. Enter the time duration in seconds. Default is 900 seconds. If you require more than 900 seconds, you can set the time duration up to a maximum of 12 hours in the AWS console and then enter the same time duration in this property. |
Connection property | Description |
|---|---|
Schema | The Databricks Delta schema name to be used when creating the object. Th schema name is similar to the schema name specified in the SCHEMAMAP.properties file. Default is schema_changeit. |
Property name | Description |
|---|---|
Schema Name | Overrides the schema specified in the connection. Default is $$dbd_schema. |
Staging Location | Relative directory path to store the staging files. If the Databricks cluster is deployed on AWS, use the path relative to the Amazon S3 staging bucket. If the Databricks cluster is deployed on Azure, use the path relative to the Azure Data Lake Store Gen2 staging filesystem name. Default is $$dbd_staging_loc. |
Database Name | Overrides the database name provided in connection and the database name provided during metadata import. Note: To read from multiple objects ensure that you have specified the database name in the connection properties. |
Job Timeout | Maximum time in seconds that is taken by the Spark job to complete processing. If the job is not completed within the time specified, the Databricks cluster terminates the job and the mapping fails. If the job timeout is not specified, the mapping shows success or failure based on the job completion. |
Job Status Poll Interval | Poll interval in seconds at which the Secure Agent checks the status of the job completion. Default is 30 seconds. |
DB REST API Timeout | The Maximum time in seconds for which the Secure Agent retries the REST API calls to Databricks when there is an error due to network connection or if the REST endpoint returns 5xx HTTP error code. Default is 10 minutes. |
DB REST API Retry Interval | The time Interval in seconds at which the Secure Agent must retry the REST API call, when there is an error due to network connection or when the REST endpoint returns 5xx HTTP error code. This value does not apply to the Job status REST API. Use job status poll interval value for the Job status REST API. Default is 30 seconds. |
Property Name | Description |
|---|---|
Schema Name | Overrides the schema specified in the connection. Default is $$dbd_schema. |
Staging Location | Relative directory path to store the staging files. If the Databricks cluster is deployed on AWS, use the path relative to the Amazon S3 staging bucket. If the Databricks cluster is deployed on Azure, use the path relative to the Azure Data Lake Store Gen2 staging filesystem name. Default is $$dbd_staging_loc. |
Target Database Name | Overrides the database name provided in the connection and the database selected in the metadata browser for existing targets. |
Write Disposition | Overwrites or adds data to the existing data in a table. You can select from the following options:
|
DTM Staging File Size | The size of the flat file that Data Integration creates locally in a temporary folder to stage the data before writing to Databrick. Default is 50 MB. |
Job Timeout | Maximum time in seconds that is taken by the Spark job to complete processing. If the job is not completed within the time specified, the job terminates and the mapping fails. If the job timeout is not specified, the mapping shows success or failure based on the job completion. |
Job Status Poll Interval | Poll interval in seconds at which the Secure Agent checks the status of the job completion. Default is 30 seconds. |
DB REST API Timeout | The Maximum time in seconds for which the Secure Agent retries the REST API calls to Databricks when there is an error due to network connection or if the REST endpoint returns 5xx HTTP error code. Default is 10 minutes. |
DB REST API Retry | The time Interval in seconds at which the Secure Agent must retry the REST API call, when there is an error due to network connection or when the REST endpoint returns 5xx HTTP error code. This value does not apply to the Job status REST API. Use job status poll interval value for the Job status REST API. Default is 30 seconds. |
Property Name | Description |
|---|---|
Schema Name | Overrides the schema specified in the connection. Default is $$dbd_schema. |
Staging Location | Relative directory path to store the staging files. If the Databricks cluster is deployed on AWS, use the path relative to the Amazon S3 staging bucket. If the Databricks cluster is deployed on Azure, use the path relative to the Azure Data Lake Store Gen2 staging filesystem name. Default is $$dbd_staging_loc. |
Database Name | Overrides the database specified in the connection. |
Job Timeout | Maximum time in seconds that is taken by the Spark job to complete processing. If the job is not completed within the time specified, the Databricks cluster terminates the job and the mapping fails. If the job timeout is not specified, the mapping shows success or failure based on the job completion. |
Job Status Poll Interval | Poll interval in seconds at which the Secure Agent checks the status of the job completion. Default is 30 seconds. |
DB REST API Timeout | The Maximum time in seconds for which the Secure Agent retries the REST API calls to Databricks when there is an error due to network connection or if the REST endpoint returns 5xx HTTP error code. Default is 10 minutes. |
DB REST API Retry Interval | The time Interval in seconds at which the Secure Agent must retry the REST API call, when there is an error due to network connection or when the REST endpoint returns 5xx HTTP error code. This value does not apply to the Job status REST API. Use job status poll interval value for the Job status REST API. Default is 30 seconds. |
Connection property | Description |
|---|---|
Source Dataset ID | Name of the dataset that contains the source table and target table that you want to connect to. Default is datasetID_changeit. Note: Google BigQuery supports the datasets that reside only in the US region. |
Property name | Description |
|---|---|
Source Dataset ID | Overrides the Google BigQuery dataset name that you specified in the connection. Default is $$sf_database. |
Source Staging Dataset | Overrides the Google BigQuery staging dataset name that you specified in the connection and the Source Dataset ID source advanced property. |
Number of Rows to Read | Specifies the number of rows to read from the Google BigQuery source table. |
Job Poll Interval In Seconds | The number of seconds after which Google BigQuery Connector polls the status of the read job operation. Default is 10. |
Read Mode | Specifies the read mode to read data from the Google BigQuery source. You can select one the following read modes:
When you use hybrid and complex connection mode, you cannot use direct mode to read data from the Google BigQuery source. Default is Staging mode. |
Use EXPORT DATA Statement to stage | Uses the EXPORT DATA statement to export data from Google BigQuery to Google Cloud Storage. If the query contains an ORDER BY clause, the specified order is maintained when you export the data. This property applies to staging mode. Default is false. |
Number of Threads for Downloading Staging Files | Specifies the number of files that Google BigQuery Connector downloads at a time to enable parallel download. This property applies to staging mode. Default is 1. |
Local Stage File Directory | Specifies the directory on your local machine where Google BigQuery Connector stores the Google BigQuery source data temporarily before it reads the data. This property applies to staging mode. Default is $PMTempDir. |
Staging File Name | Name of the staging file where data from the Google BigQuery source table is exported to Google Cloud Storage. This property applies to staging mode. |
Persist Extract Staging File After Download | Indicates whether Google BigQuery V2 Connector must persist the staging file after it reads data from the staging file. By default, Google BigQuery V2 Connector deletes the staging file. |
Persist Destination Table | Indicates whether Google BigQuery V2 Connector must persist the query results table after it reads data from the query results table. By default, Google BigQuery V2 Connector deletes the query results table. |
Data format of the staging file | Specifies the data format of the staging file. You can select one of the following data formats:
In a .csv file, columns of the Timestamp data type are represented as floating point numbers that cause the milliseconds value to differ. Default is JSON. |
Enable Staging File Compression | Indicates whether to compress the size of the staging file in Google Cloud Storage before Google BigQuery Connector reads data from the staging file. You can enable staging file compression to reduce cost and transfer time. This property applies to staging mode. Default is false. |
Retry Options | Comma-separated list to specify the following retry options:
Default is $$gbq_retry_options. |
Use Legacy SQL for SQL Override | Indicates that the SQL Override query is specified in legacy SQL. Use the following format to specify a legacy SQL query for the SQL Override Query property: SELECT <Col1, Col2, Col3> FROM [projectID:datasetID.tableName] Clear this option to define a standard SQL override query. Use the following format to specify a standard SQL query for the SQL Override Query property: SELECT * FROM `projectID.datasetID.tableName` Default is false. |
pre SQL Configuration | Specify a pre SQL configuration. For example, DestinationTable:PRESQL_SRC,DestinationDataset:EMPLOYEE, FlattenResults:False,WriteDisposition:WRITE_TRUNCATE,UseLegacySql:False |
post SQL Configuration | Specify a post SQL configuration. For example, DestinationTable:POSTSQL_SRC,DestinationDataset:EMPLOYEE, FlattenResults:True,WriteDisposition:WRITE_TRUNCATE,UseLegacySql:False |
Label | You can assign a label for the transformation to organize and filter the associated jobs in the Google Cloud Platform Log Explorer. |
Billing Project ID | The project ID for the Google Cloud project that is linked to an active Google Cloud Billing account where the Secure Agent runs query and extract jobs. If you omit the project ID here, the Secure Agent runs query and extract jobs in the Google Cloud project corresponding to the Project ID value specified in the Google BigQuery V2 connection. |
Number of Spark Partitions | Specifies the maximum number of partitions that the Spark engine splits the data into. Default is 1. |
Property name | Description |
|---|---|
Target Dataset ID | Overrides the Google BigQuery dataset name that you specified in the connection. Default is $$gbq_datasetID. |
Write Mode | Specifies the mode to write data to the Google BigQuery target. You can select one of the following modes:
Default is Bulk mode. |
Data format of the staging file | Specifies the data format of the staging file. You can select one of the following data formats:
In a .csv file, columns of the Timestamp data type are represented as floating point numbers that cause the milliseconds value to differ. Only JSON format is applicable for mappings in advanced mode. This property applies to bulk and CDC mode. Avro and parquet format is not applicable when you perform a data driven operation. Default is JSON. |
Enable Staging File Compression | Select this option to compress the size of the staging file before Google BigQuery writes the data to the Google Cloud Storage and decompress the staging file before it loads the data to the Google BigQuery target. You can enable staging file compression to reduce cost and transfer time. Default is false. |
Local Stage File Directory | Specifies the directory on your local machine where Google BigQuery V2 Connector stores the files temporarily before writing the data to the staging file in Google Cloud Storage. This property applies to bulk mode. Default is $PMTempDir. |
Use Default Column Values | Applicable when the selected data format for the staging file is CSV when the mapping contains unconnected ports. Includes the default column values for the unconnected port from the staging file to create the target. This is applicable when you have defined the default constraint value in the Google BigQuery source column. When you do not enable this option, the agent creates a target only with the connected ports. The agent populates null or empty strings for unconnected ports. Default is true. |
Enable Merge | Implements the Merge query to perform an update, upsert, delete or data driven operation on a Google BigQuery target table. If you select the Enable Data Driven property, you must select this option. Default is disabled. |
Target Staging Dataset | Optional. Overrides the Google BigQuery staging dataset name that you specified in the connection and the Target Dataset ID target advanced property. |
Create Disposition | Specifies whether Google BigQuery V2 Connector must create the target table if it does not exist. You can select one of the following values:
Create disposition is applicable only when you perform an insert operation on a Google BigQuery target. |
Write Disposition | Specifies how Google BigQuery V2 Connector must write data in bulk mode if the target table already exists. You can select one of the following values:
Write disposition is applicable for bulk mode. Write disposition is applicable only when you perform an insert operation on a Google BigQuery target. |
Streaming Template Table Suffix | Specify the suffix to add to the individual target tables that Google BigQuery V2 Connector creates based on the template target table. This property applies to streaming mode. If you select the Enable Merge option, Google BigQuery V2 Connector ignores this property. Streaming mode is not applicable when you perform a data driven operation. |
Rows per Streaming Request | Specifies the number of rows that Google BigQuery V2 Connector streams to the BigQuery target for each request. Default is 500 rows. The maximum row size that Google BigQuery V2 Connector can stream to the Google BigQuery target for each request is 10 MB. This property applies to streaming mode. Streaming mode is not applicable when you perform a data driven operation. |
Staging File Name | Name of the staging file that Google BigQuery V2 Connector creates in the Google Cloud Storage before it loads the data to the Google BigQuery target. This property applies to bulk mode. |
Persist Staging File After Loading | Indicates whether Google BigQuery V2 Connector must persist the staging file in the Google Cloud Storage after it writes the data to the Google BigQuery target. You can persist the staging file if you want to archive the data for future reference. By default, Google BigQuery V2 Connector deletes the staging file in Google Cloud Storage. This property applies to bulk mode. |
Job Poll Interval In Seconds | The number of seconds after which Google BigQuery V2 Connector polls the status of the write job operation. Default is 10. |
Number of Threads for Uploading Staging File | The number of files that Google BigQuery V2 Connector must create to upload the staging file in bulk mode. |
Allow Quoted Newlines | Indicates whether Google BigQuery V2 Connector must allow the quoted data sections with newline character in a .csv file. |
Field Delimiter | Indicates whether Google BigQuery V2 Connector must allow field separators for the fields in a .csv file. |
Allow Jagged Rows | Indicates whether Google BigQuery V2 Connector must accept the rows without trailing columns in a .csv file. |
Suppress post SQL on Error | Indicates whether the Secure Agent must abort the post-SQL query execution in case the task fails to write data to the Google BigQuery target table due to errors. Default is disabled. |
pre SQL Configuration | Specify a pre-SQL configuration. For example, DestinationTable:PRESQL_TGT2, DestinationDataset:EMPLOYEE, FlattenResults:False, WriteDisposition:WRITE_TRUNCATE, UseLegacySql:False |
post SQL Configuration | Specify a post-SQL configuration. For example, DestinationTable:POSTSQL_SRC, DestinationDataset:EMPLOYEE, FlattenResults:True, UseLegacySQL:False |
Quote Char | Specifies the quote character to skip when you write data to Google BigQuery. When you write data to Google BigQuery and the source table contains the specified quote character, the task fails. Change the quote character value to a value that does not exist in the source table. Default is double quotes. |
Allow Duplicate Inserts | Indicates that the Secure Agent can insert duplicate rows into the Google BigQuery target. Applicable only when you perform a data driven operation and DD_INSERT is specified in the data driven condition. Default is not selected. |
Disable Duplicate Update Rows | Determines if multiple incoming rows attempt to update the same target row, the Secure Agent must process only one of the incoming rows and ignore the rest of the incoming rows. Select this option to configure the mapping to process only one of the incoming rows and ignore the rest of the incoming rows. Default is disabled. |
Label | You can assign a label for the transformation to organize and filter the associated jobs in the Google Cloud Platform Log Explorer. |
Billing Project ID | The project ID for the Google Cloud project that is linked to an active Google Cloud Billing account where the Secure Agent runs query and load jobs. If you omit the project ID here, the Secure Agent runs query and load jobs in the Google Cloud project corresponding to the Project ID value specified in the Google BigQuery V2 connection. |
Forward Rejected Rows | Applicable only when you configure DD_REJECT constant in the data driven condition to reject all the rows. Otherwise, this property is not applicable for Google BigQuery V2 Connector. |
Property name | Description |
|---|---|
Source Dataset ID | Overrides the Google BigQuery dataset name that you specified in the connection. Default is $$gbq_datasetID. |
Source Staging Dataset | Overrides the Google BigQuery staging dataset name that you specified in the Lookup transformation. |
Job Poll Interval In Seconds | The number of seconds after which Google BigQuery V2 Connector polls the status of the read job operation. Default is 10. |
Read Mode | Specifies the read mode to read data from the Google BigQuery source. You can select one the following read modes:
When you use hybrid and complex connection mode, you cannot use direct mode to read data from the Google BigQuery source. Default is Staging mode. |
Use EXPORT DATA Statement to stage | Uses the EXPORT DATA statement to export data from Google BigQuery to Google Cloud Storage. If the query contains an ORDER BY clause, the specified order is maintained when you export the data. This property applies to staging mode. Default is true. |
Number of Threads for Downloading Staging Files | Specifies the number of files that Google BigQuery Connector downloads at a time to enable parallel download. This property applies to staging mode. Default is 1. |
Local Stage File Directory | Specifies the directory on your local machine where Google BigQuery V2 Connector stores the Google BigQuery source data temporarily before it reads the data. This property applies to staging mode. Default is $PMTempDir. |
Data format of the staging file | Specifies the data format of the staging file. You can select one of the following data formats:
In a .csv file, columns of the Timestamp data type are represented as floating point numbers that cause the milliseconds value to differ. Only JSON format is applicable for mappings in advanced mode. This property applies to bulk and CDC mode. Avro and parquet format is not applicable when you perform a data driven operation. Default is JSON. |
Enable Staging File Compression | Indicates whether to compress the size of the staging file in Google Cloud Storage before Google BigQuery V2 Connector reads data from the staging file. You can enable staging file compression to reduce cost and transfer time. This property applies to staging mode. Default is false. |
Retry Options | Comma-separated list to specify the following retry options:
Default is $$gbq_retry_options. |
Use Legacy SQL for SQL Override | Indicates that the SQL Override query is specified in legacy SQL. Use the following format to specify a legacy SQL query for the SQL Override Query property: SELECT <Col1, Col2, Col3> FROM [projectID:datasetID.tableName] Clear this option to define a standard SQL override query. Use the following format to specify a standard SQL query for the SQL Override Query property: SELECT * FROM 'projectID.datasetID.tableName' |
Number of Rows to Read | Specifies the number of rows to read from the Google BigQuery source table. |
Staging File Name | Name of the staging file where data from the Google BigQuery source table is exported to Google Cloud Storage. This property applies to staging mode. |
Persist Extract Staging File After Download | Indicates whether Google BigQuery V2 Connector must persist the staging file in the Google Cloud Storage after it writes the data to the Google BigQuery target. You can persist the staging file if you want to archive the data for future reference. By default, Google BigQuery V2 Connector deletes the staging file in Google Cloud Storage. This property applies to bulk mode. |
Persist Destination Table | Indicates whether Google BigQuery V2 Connector must persist the query results table after it reads data from the query results table. By default, Google BigQuery V2 Connector deletes the query results table. |
pre SQL Configuration | Specify a pre SQL configuration. For example, DestinationTable:PRESQL_SRC,DestinationDataset:EMPLOYEE, FlattenResults:False,WriteDisposition:WRITE_TRUNCATE,UseLegacySql:False |
post SQL Configuration | Specify a post SQL configuration. For example, DestinationTable:POSTSQL_SRC,DestinationDataset:EMPLOYEE, FlattenResults:True,WriteDisposition:WRITE_TRUNCATE,UseLegacySql:False |
Label | You can assign a label for the transformation to organize and filter the associated jobs in the Google Cloud Platform Log Explorer. |
Billing Project ID | The project ID for the Google Cloud project that is linked to an active Google Cloud Billing account where the Secure Agent runs query and extract jobs. If you omit the project ID here, the Secure Agent runs query and extract jobs in the Google Cloud project corresponding to the Project ID value specified in the Google BigQuery V2 connection. |
Number of Spark Partitions | Specifies the maximum number of partitions that the Spark engine splits the data into. Default is 1. |
Connection property | Description |
|---|---|
Schema | The Microsoft Azure Synapse SQL schema name to be used when creating the object. Th schema name is similar to the schema name specified in the SCHEMAMAP.properties file. Default is schema_changeit. |
Property | Description |
|---|---|
Azure Blob Container Name | The name of the container in Microsoft Azure Blob Storage. The container name must not contain special characters. Default is $$blob_container_name. |
ADLS FileSystem Name | Required if you select ADLS Gen2 storage in the connection properties. The name of the file system in Microsoft Azure Data Lake Storage Gen2. The file system name must not contain special characters. You can also specify the path of the directory under the file system. Use only a forward slash to specify the directory path. Default is $$adls_folder_name. |
Schema Name Override | Overrides the schema specified in the connection. Default is $$syn_schema. |
Staging File Format | Type of file format to use when you stage the files in Azure storage. Select one of the following formats:
|
Field Delimiter | Character used to separate fields in the file. Default is 0x1e. You can specify 'TAB' or 0-256 single-char printable and non-printable ASCII characters. Non-printable characters must be specified in hexadecimal. Note: Multi-char ASCII characters except TAB are not applicable. You cannot use the following non-printable characters: 00x0, 0x0, 0x0A , 0x1B, 0x0D, and 0x1F |
Number of Concurrent Connections to Blob Store | Number of concurrent connections to extract data from the Microsoft Azure Blob Storage. When reading a large-size blob, you can spawn multiple threads to process data. Configure Blob Part Size to partition a large-size blob into smaller parts. Default is 4. Maximum is 10. |
Blob Part Size | Partitions a blob into smaller parts each of specified part size. When reading a large-size blob, consider partitioning the blob into smaller parts and configure concurrent connections to spawn required number of threads to process data in parallel. Default is 8 MB. |
On Pre-Post SQL Error | Determines the behavior when a task that includes pre-SQL or post-SQL commands encounters errors. Select one of the following options:
|
Quote Character | Specifies the quote character to skip when you read data from Microsoft Azure Synapse SQL. The quote character that you specify must not exist in the source table. If it exists, enter a different quote character value. Default is 0x1f. |
Interim Directory | Optional. Path to the staging directory in the Secure Agent machine. Specify the staging directory where you want to stage the files when you read data from Microsoft Azure SQL Data Warehouse. Ensure that the directory has sufficient space and you have read permissions to the directory. Default staging directory is /tmp. You cannot specify an interim directory when you use the Hosted Agent. |
Advanced Property | Description |
|---|---|
Azure Blob Container Name | The name of the container in Microsoft Azure Blob Storage. The container name must not contain special characters. Default is $$blob_container_name. |
ADLS FileSystem Name | Required if you select ADLS Gen2 storage in the connection properties. The name of the file system in Microsoft Azure Data Lake Storage Gen2. The file system name must not contain special characters. You can also specify the path of the directory under the file system. Use only a forward slash to specify the directory path. Default is $$adls_folder_name. |
Copy Method | The method to load data from the staging location to Microsoft Azure Synapse SQL. Select one of the following options:
Default is Polybase. |
Copy Command Options | Options for the copy command in key=value format. Specify each option on a new line. |
Schema Name Override | Overrides the schema specified in the connection. Default is $$syn_schema. |
Staging File Format | Type of file format to use when you stage the files in Azure storage. Select one of the following formats:
|
Field Delimiter | Character used to separate fields in the file. Default is 0x1e. You can specify 'TAB' or 0-256 single-char printable and non-printable ASCII characters. Non-printable characters must be specified in hexadecimal. Note: Multi-char ASCII characters except TAB are not applicable. You cannot use the following non-printable characters: 00x0, 0x0, 0x0A , 0x1B, 0x0D, and 0x1F |
Number of Concurrent Connections to Blob Storage | Number of concurrent connections to extract data from the Microsoft Azure Blob Storage. When reading a large-size blob, you can spawn multiple threads to process data. Default is 4. Maximum is 10. |
On Pre-Post SQL Error | Determines the behavior when a task that includes pre-SQL or post-SQL commands encounters errors. You can select any of the following options:
|
Batch Size | Minimum number of rows in a batch. Enter a number greater than 0. Default is 2000000. |
Reject Threshold | Number of errors within a batch that causes a batch to fail. Enter a positive integer. If the number of errors is equal to or greater than the property value, the Secure Agent rejects the entire batch to the error file and marks the session failed. Note: When you do not set the reject threshold, the mapping fails when an error is encountered. |
Quote Character | Specifies the quote character to skip when you write data to Microsoft Azure Synapse SQL. The quote character that you specify must not exist in the source table. If it exists, enter a different quote character value. |
Compression Format | Compresses the staging files in the .Gzip format. Default is None. |
Interim Directory | Optional. Path to the staging directory in the Secure Agent machine. Specify the staging directory where you want to stage the files when you write data to Microsoft Azure Synapse SQL. Ensure that the directory has sufficient space and you have write permissions to the directory. Default staging directory is /tmp. You cannot specify an interim directory when you use the Hosted Agent. |
Property name | Description |
|---|---|
Schema Name Override | Overrides the schema specified in the connection. Default is $$syn_schema. |
Azure Blob Container Name | The name of the container in Microsoft Azure Blob Storage. The container name must not contain special characters. Default is $$blob_container_name. |
ADLS FileSystem Name | Required if you select ADLS Gen2 storage in the connection properties. The name of the file system in Microsoft Azure Data Lake Storage Gen2. The file system name must not contain special characters. You can also specify the path of the directory under the file system. Use only a forward slash to specify the directory path. Default is $$adls_folder_name. |
Staging File Format | Type of file format to use when you stage the files in Azure storage. Select one of the following formats:
|
Field Delimiter | Character used to separate fields in the file. Default is 0x1e. You can specify 'TAB' or 0-256 single-char printable and non-printable ASCII characters. Non-printable characters must be specified in hexadecimal. Note: Multi-char ASCII characters except TAB are not applicable. You cannot use the following non-printable characters: 00x0, 0x0, 0x0A , 0x1B, 0x0D, and 0x1F |
Number of concurrent connections to Blob Store | Number of concurrent connections to extract data from the Microsoft Azure Blob Storage. When reading a large-size blob, you can spawn multiple threads to process data. Configure Blob Part Size to partition a large-size blob into smaller parts. Default is 4. Maximum is 10. |
Blob Part Size | Partitions a blob into smaller parts each of specified part size. When reading a large-size blob, consider partitioning the blob into smaller parts and configure concurrent connections to spawn required number of threads to process data in parallel. Default is 8 MB. |
On Pre-Post SQL Error | Determines the behavior when a task that includes pre-SQL or post-SQL commands encounters errors. You can select any of the following options:
|
Quote Character | The Secure Agent skips the specified character when you read data from Microsoft Azure SQL Data Warehouse. Default is 0x1f. |
Interim Directory | Optional. Path to the staging directory in the Secure Agent machine. Specify the staging directory where you want to stage the files when you read data from Microsoft Azure SQL Data Warehouse. Ensure that the directory has sufficient space and you have write permissions to the directory. Default staging directory is /tmp. You cannot specify an interim directory when you use the Hosted Agent. |
Connection property | Description |
|---|---|
Schema | The PostgreSQL schema name to be used when creating the object. Th schema name is similar to the schema name specified in the SCHEMAMAP.properties file. Default is schema_changeit. |
Property name | Description |
|---|---|
Schema Name | Overrides the schema name of the source object. Default is $$pg_schema. |
Property name | Description |
|---|---|
Schema Name | Overrides the schema name of the target object. Default is $$pg_schema. |
Property name | Description |
|---|---|
Schema Name | Overrides the schema name of the lookup object. Default is $$pg_schema. |
Connection property | Description |
|---|---|
Database | The Snowflake database name to be used when creating the object. The database name similar to the database name specified in the DATABASEMAP.properties file. Default is dbname_changeit. |
Schema | The Snowflake schema name to be used when creating the object. Th schema name is similar to the schema name specified in the SCHEMAMAP.properties file. Default is schema_changeit. |
Property name | Description |
|---|---|
Database | Overrides the database specified in the connection. Default is $$sf_database. |
Schema | Overrides the schema specified in the connection. Default is $$sf_schema. |
Warehouse | Overrides the Snowflake warehouse name specified in the connection. |
Role | Overrides the Snowflake role assigned to user you specified in the connection. |
Property Name | Description |
|---|---|
Database | Overrides the database that you used to import the object. Default is $$sf_database. |
Schema | Overrides the schema that you used to import the object. Default is $$sf_schema. |
Warehouse | Overrides the Snowflake name specified in the connection. The warehouse name in the mapping overrides the warehouse name you specify in the connection. |
Role | Overrides the Snowflake role assigned to the user specified in the connection. |
Batch Row Size | The number of rows written to a single file in the agent location. When the number of rows written to the file reaches the value specified, the agent flushes the data queue and starts processing the write commands. |
Number of local staging files | The number of files that represents a single batch of data. The default number of files considered is 64. After the agent uploads the specified number of local staging files to the Snowflake user stage, Snowflake unloads the data to the target table. |
Additional Write Runtime Parameters | The additional runtime parameters that you can use when you write to Snowflake. You can enter multiple write runtime parameters, separated by ampersand (&). |
Rejected File Path | The filename and path of the file on the agent machine where you want to write the rejected records. For example, \rejectedfiles\reject7 |
Forward Rejected Rows | Determines whether the transformation passes rejected rows to the next transformation or drops rejected rows. By default, the agent forwards rejected rows to the next transformation. |
Property Name | Description |
|---|---|
Database | Overrides the database specified in the connection. Default is $$sf_database. |
Schema | Overrides the schema specified in the connection. Default is $$sf_schema. |
Warehouse | Overrides the Snowflake warehouse name specified in the connection. |
Role | Overrides the Snowflake role assigned to user you specified in the connection. |