Connections for INFACore > Connections to source and target endpoints > Google BigQuery
  

Google BigQuery

When you create a Google BigQuery connection, configure the connection properties.

Feature snapshot

Operation
Support
Read
Yes
Write
Yes

Before you begin

Before you configure the connection properties, you'll need to get information from your Google account.
The following video shows you how to get information from your Google account:
https://infa.media/3e4LzdW

Connection properties

The following table describes the Google BigQuery connection properties:
Property
Description
Connection Name
Name of the connection.
Each connection name must be unique within the organization. Connection names can contain alphanumeric characters, spaces, and the following special characters: _ . + -,
Maximum length is 255 characters.
Service Account ID
The client_email value in the JSON file that you download after you create a service account.
Service Account Key
The private_key value in the JSON file that you download after you create a service account.
Project ID
The project_id value in the JSON file that you download after you create a service account.
If you have created multiple projects with the same service account, enter the ID of the project that contains the dataset that you want to connect to.
Storage Path
Path in Google Cloud Storage where the agent creates a local stage file to store the data temporarily.
Applies to tasks that read or write large volumes of data. Use this property when you read data in staging mode or write data in bulk mode.
You can either enter the bucket name or the bucket name and folder name.
Use one of the following formats:
  • - gs://<bucket name>
  • - gs://<bucket name>/<folder_name>
Connection mode
The mode that you want to use to read data from or write data to Google BigQuery.
Select one of the following connection modes:
  • - Simple. Flattens each field within the Record data type field as a separate field.
  • - Hybrid. Displays all the top-level fields in the Google BigQuery table including Record data type fields. Google BigQuery connection displays the top-level Record data type field as a single field of the String data type.
  • - Complex. Displays all the columns in the Google BigQuery table as a single field of the String data type.
Default is Simple.
Connection mode
The mode that you want to use to read data from or write data to Google BigQuery.
Select one of the following connection modes:
  • - Simple. Flattens each field within the Record data type field as a separate field.
  • - Hybrid. Displays all the top-level fields in the Google BigQuery table including Record data type fields. Google BigQuery connection displays the top-level Record data type field as a single field of the String data type.
  • - Complex. Displays all the columns in the Google BigQuery table as a single field of the String data type.
Default is Simple.
Schema Definition File Path
Storage path in Google Cloud Storage where the agent must create a JSON file with the sample schema of the Google BigQuery table. You can download the JSON file from the specified storage path in Google Cloud Storage to a local machine.
Use Legacy SQL For Custom Query
Select this option to use a legacy SQL to define a custom query. If you clear this option, you must use a standard SQL to define a custom query.
Note: Not applicable when you configure the Google BigQuery connection in hybrid or complex mode.
Dataset Name for Custom Query
When you define a custom query, you must specify a Google BigQuery dataset.
Optional Properties
Specifies whether you can configure source and target functionality through custom properties.
You can select one of the following options:
  • - None. If you do not want to configure any custom properties, select None.
  • - Required. If you want to specify custom properties to configure the source and target functionalities.
Default is None.
Provide Optional Properties
Comma-separated key-value pairs of custom properties in the Google BigQuery connection to configure certain source and target functionalities.
Appears only when you select Required in the Optional Properties.
Schema Definition File Path
Directory on the Secure Agent machine where the Secure Agent must create a JSON file with the sample schema of the Google BigQuery table. The JSON file name is the same as the Google BigQuery table name.
Alternatively, you can specify a storage path in Google Cloud Storage where the Secure Agent must create a JSON file with the sample schema of the Google BigQuery table. You can download the JSON file from the specified storage path in Google Cloud Storage to a local machine.
Use Legacy SQL For Custom Query
Select this option to use a legacy SQL to define a custom query. If you clear this option, you must use a standard SQL to define a custom query.
Note: Not applicable when you configure the Google BigQuery connection in hybrid or complex mode.
Dataset Name for Custom Query
When you define a custom query, you must specify a Google BigQuery dataset.
Region Id
The region name where the Google BigQuery dataset that you want to access resides.
Note: You must ensure that you specify a bucket name or the bucket name and folder name in the Storage Path property that resides in the specified region.
For more information about the regions supported by Google BigQuery, see Dataset locations.
Staging Dataset
The Google BigQuery dataset name where you want to create the staging table to stage the data. You can define a Google BigQuery dataset that is different from the source or target dataset.
Optional Properties
Specifies whether you can configure source and target functionality through custom properties.
You can select one of the following options:
  • - None. If you do not want to configure any custom properties, select None.
  • - Required. If you want to specify custom properties to configure the source and target functionalities.
Default is None.
Provide Optional Properties
Comma-separated key-value pairs of custom properties in the Google BigQuery connection to configure certain source and target functionalities.
Appears when you select Required in the Optional Properties.
For more information about the list of custom properties that you can specify, see the Informatica Knowledge Base article: https://kb.informatica.com/faq/7/Pages/26/632722.aspx
Ensure that you specify valid credentials in the connection properties. The test connection is successful even if you specify incorrect credentials in the connection properties.

Retry Strategy

When you read data from Google BigQuery in staging mode, you can configure the retry strategy when the Google BigQuery connection fails to connect to the Google BigQuery source.
The following table describes the retry properties for the Google BigQuery connection:
Property
Description
Enable Retry
Indicates that the Secure Agent attempts to retry the connection when there is a failure.
Select this option to enable connection retry.
Default is unselected.
Maximum Retry Attempts
The maximum number of retry attempts that the Secure Agent performs to receive the response from the Google BigQuery endpoint.
If the Secure Agent fails to connect to Google BigQuery within the maximum retry attempts, the connection fails.
Default is 6.
Appears when you select the Enable Retry property.
Initial Retry Delay
The initial wait time in seconds before the Secure Agent attempts to retry the connection.
Default is 1.
Appears when you select the Enable Retry property.
Retry Delay Multiplier
The multiplier that the Secure Agent uses to exponentially increase the wait time between successive retry attempts up to the maximum retry delay time.
Default is 2.0.
Appears when you select the Enable Retry property.
Maximum Retry Delay
The maximum wait time in seconds that the Secure Agent waits between successive retry attempts.
Default is 32.
Appears when you select the Enable Retry property.
Total Timeout
The total time duration in seconds that the Secure Agent attempts to retry the connection after which the connection fails.
Default is 50.
Appears when you select the Enable Retry property.

Read properties

The following table describes the advanced source properties that you can configure in the Python code to read from Google BigQuery:
Property
Description
Source Dataset ID
Optional. Overrides the Google BigQuery dataset name that you specified in the Source transformation.
Source Table Name
Optional. Overrides the Google BigQuery table name that you specified in the Source transformation.
Source Staging Dataset
Optional. Overrides the Google BigQuery staging dataset name that you specified in the connection and the Source Dataset ID source advanced property.
Number of Rows to Read
Specifies the number of rows to read from the Google BigQuery source table.
Allow Large Result
Determines whether Google BigQuery V2 Connector must produce arbitrarily large result tables to query large source tables.
If you select this option, you must specify a destination table to store the query results.
Query Results Table Name
Required if you select the Allow Large Results option.
Specifies the destination table name to store the query results. If the table is not present in the dataset, Google BigQuery V2 Connector creates the destination table with the name that you specify.
Job Poll Interval in Seconds
The number of seconds after which Google BigQuery V2 Connector polls the status of the read job operation.
Default is 10.
Read Mode
Specifies the read mode to read data from the Google BigQuery source.
You can select one the following read modes:
  • - Direct. In direct mode, Google BigQuery V2 Connector reads data directly from the Google BigQuery source table.
  • Note: When you use hybrid and complex connection mode, you cannot use direct mode to read data from the Google BigQuery source.
  • - Staging¹. In staging mode, Google BigQuery V2 Connector exports data from the Google BigQuery source into Google Cloud Storage. After the export is complete, Google BigQuery V2 Connector downloads the data from Google Cloud Storage into the local stage file and then reads data from the local stage file.
Default is Direct mode.
Use EXPORT DATA Statement to stage
Indicates whether to support ORDER BY clause in a custom query or SQL Override Query.
This property applies to staging mode.
Number of Threads for Downloading Staging Files
Specifies the number of files that Google BigQuery V2 Connector downloads at a time to enable parallel download.
This property applies to staging mode.
Local Stage File Directory
Specifies the directory on your local machine where Google BigQuery V2 Connector stores the Google BigQuery source data temporarily before it reads the data.
This property applies to staging mode.
Staging File Name
Name of the staging file where data from the Google BigQuery source table is exported to Google Cloud Storage.
This property applies to staging mode.
Data Format of the staging file
Specifies the data format of the staging file. You can select one of the following data formats:
  • - Avro
  • - JSON (Newline Delimited). Supports flat and record data with nested and repeated fields.
  • - CSV. Supports flat data.
  • Note: In a .csv file, columns of the Timestamp data type are represented as floating point numbers that cause the milliseconds value to differ.
  • - Parquet
This property applies to staging mode.
Enable Staging File Compression
Indicates whether to compress the size of the staging file in Google Cloud Storage before Google BigQuery V2 Connector reads data from the staging file.
You can enable staging file compression to reduce cost and transfer time.
This property applies to staging mode.
Persist Extract Staging File After Download
Indicates whether Google BigQuery V2 Connector must persist the staging file after it reads data from the staging file.
By default, Google BigQuery V2 Connector deletes the staging file.
Persist Destination Table
Indicates whether Google BigQuery V2 Connector must persist the query results table after it reads data from the query results table.
By default, Google BigQuery V2 Connector deletes the query results table.
pre SQL
SQL statement that you want to run before reading data from the source.
For example, if you want to select records in the database before you read the records from the table, specify the following pre SQL statement:
SELECT * FROM [api-project-80697026669:EMPLOYEE.DEPARTMENT] LIMIT 1000;
pre SQL Configuration
Specify a pre SQL configuration.
For example,
DestinationTable:PRESQL_SRC,DestinationDataset:EMPLOYEE,
FlattenResults:False,WriteDisposition:WRITE_TRUNCATE,UseLegacySql:False
post SQL
SQL statement that you want to run after reading data from the source.
For example, if you want to update records in a table after you read the records from a source table, specify the following post SQL statement:
UPDATE [api-project-80697026669.EMPLOYEE.PERSONS_TGT_DEL]
SET phoneNumber.number=1000011, phoneNumber.areaCode=100 where fullname='John Doe'
post SQL Configuration
Specify a post SQL configuration.
For example,
DestinationTable:POSTSQL_SRC,DestinationDataset:EMPLOYEE,
FlattenResults:True,WriteDisposition:WRITE_TRUNCATE,UseLegacySql:False
SQL Override Query
Overrides the default SQL query used to read data from the Google BigQuery source.
Note: When you specify SQL override query, you must specify a dataset name in the Source Dataset ID advanced source property.
Ensure that the list of selected columns, data types, and the order of the columns that appear in the query matches the columns, data types, and order in which they appear in the source object.
Ensure that you only map all the columns in the SQL override query to the target.
Select the source advanced property Use EXPORT DATA Statement to stage to use the ORDER BY clause in a SQL Override Query in staging mode. When staging optimization is enabled in a mapping, the columns mapped in the SQL Override Query must match the columns in the source object.
Use Legacy SQL for SQL Override
Indicates that the SQL Override query is specified in legacy SQL.
Use the following format to specify a legacy SQL query for the SQL Override Query property:
SELECT <Col1, Col2, Col3> FROM [projectID:datasetID.tableName]
Clear this option to define a standard SQL override query.
Use the following format to specify a standard SQL query for the SQL Override Query property:
SELECT * FROM `projectID.datasetID.tableName`
Retry Options
Comma-separated list to specify the following retry options:
  • - Retry Count. The number of retry attempts to read data from Google BigQuery.
  • - Retry Interval. The time in seconds to wait between each retry attempt.
  • - Retry Exceptions. The list of exceptions separated by pipe (|) character for which the retries are made.
Use the following format to specify the retry options:
For example,
RetryCount:5,RetryInterval:1,RetryExceptions:java.net.ConnectException|java.io.IOException
Note: The retry options are available for preview. Preview functionality is supported for evaluation purposes but is unwarranted and is not production-ready. Informatica recommends that you use in non-production environments only. Informatica intends to include the preview functionality in an upcoming release for production use, but might choose not to in accordance with changing market or technical circumstances. For more information, contact Informatica Global Customer Support. To use the functionality, your organization must have the appropriate licenses.
Number of Spark Partitions
Specifies the maximum number of partitions that the Spark engine splits the data into.
Default is 1.
Billing Project ID
The project ID for the Google Cloud project that is linked to an active Google Cloud Billing account where the Secure Agent runs the query.
If you do not specify a value for the Billing Project ID property, the Secure Agent runs the query in the Google Cloud project that contains the Google BigQuery objects based on the Project ID value specified in the Google BigQuery V2 connection.

Write properties

The following table describes the advanced target properties that you can configure in the Python code to write to Google BigQuery:
Property
Description
Target Dataset ID
Optional. Overrides the Google BigQuery dataset name that you specified in the connection.
UpdateMode
Determines the mode that the Secure Agent uses to update rows in the Google BigQuery target.
You can select one of the following modes:
  • - Update As Update. The Secure Agent updates all rows flagged for update if the entries exist.
  • - Update Else Insert. The Secure Agent first updates all rows flagged for update if the entries exist in the target. If the entries do not exist, the Secure Agent inserts the entries.
Default is Update as Update.
Not applicable when you perform a data driven operation.
Enable Data Driven
Implements data driven operation to honor flagged rows for an insert, update, delete, or reject operation based on the data driven condition.
Select this option when you select Data Driven as the target operation.
Enable Merge
Implements the Merge query to perform an update, upsert, delete or data driven operation on a Google BigQuery target table.
If you select the Enable Data Driven property, you must select this option.
Default is not selected.
UsUpdate Override¹ e Default Column Values
Applicable when the selected data format for the staging file is CSV when the mapping contains unconnected ports. Includes the default column values for the unconnected port from the staging file to create the target. This is applicable when you have defined the default constraint value in the Google BigQuery source column. When you do not enable this option, the agent creates a target only with the connected ports. The agent populates null or empty strings for unconnected ports.
Update Override
Optional. Overrides the update SQL statement that the Secure Agent generates to update the Google BigQuery target.
Use the following format to define an update override query:
UPDATE `<project_name>.<dataset_name>.<table_name>` as <alias_name>
SET <alias_name>.<col_name1>=:<temp_table>.<col_name1>, <alias_name>.<col_name2>=:<temp_table>.<col_name2> FROM <dataset_name>.:<temp_table> WHERE <conditional expression>
For example,
UPDATE `project1.custdataset.cust_table1` as ab SET ab.fld_str=:custtemp.fld_str, ab.fld_int=:custtemp.fld_int FROM custdataset.:custtemp WHERE ab.fld_string_req = :custtemp.fld_string_req
Not applicable when you perform a data driven operation.
Target Table Name
Optional. Overrides the Google BigQuery target table name that you specified in the Target transformation.
Note: If you specify an update override query, Google BigQuery V2 Connector ignores this property.
Target Staging Dataset
Optional. Overrides the Google BigQuery staging dataset name that you specified in the connection and the Target Dataset ID target advanced property.
Create Disposition
Specifies whether Google BigQuery V2 Connector must create the target table if it does not exist.
You can select one of the following values:
  • - Create if needed. If the table does not exist, Google BigQuery V2 Connector creates the table.
  • - Create never. If the table does not exist, Google BigQuery V2 Connector does not create the table and displays an error message.
Create disposition is applicable only when you perform an insert operation on a Google BigQuery target.
Write Disposition
Specifies how Google BigQuery V2 Connector must write data in bulk mode if the target table already exists.
You can select one of the following values:
  • - Write append. If the target table exists, Google BigQuery V2 Connector appends the data to the existing data in the table.
  • - Write truncate. If the target table exists, Google BigQuery V2 Connector overwrites the existing data in the table.
  • - Write empty. If the target table exists and contains data, Google BigQuery V2 Connector displays an error and does not write the data to the target. Google BigQuery V2 Connector writes the data to the target only if the target table does not contain any data.
Write disposition is applicable for bulk mode.
Write disposition is applicable only when you perform an insert operation on a Google BigQuery target.
Write Mode
Specifies the mode to write data to the Google BigQuery target.
You can select one of the following modes:
  • - Bulk. Google BigQuery V2 Connector first writes the data to a staging file in Google Cloud Storage. When the staging file contains all the data, Google BigQuery V2 Connector loads the data from the staging file to the BigQuery target. Google BigQuery V2 Connector then deletes the staging file unless you configure the task to persist the staging file.
  • - Streaming¹. Google BigQuery V2 Connector directly writes data to the BigQuery target. Google BigQuery V2 Connector writes the data into the target row by row.
  • - CDC¹. Applies only when you capture changed data from a CDC source. In CDC mode, Google BigQuery V2 Connector captures changed data from any CDC source and writes the changed data to a Google BigQuery target table.
Default is Bulk mode.
Streaming mode is not applicable when you perform a data driven operation.
Streaming Template Table Suffix
Specify the suffix to add to the individual target tables that Google BigQuery V2 Connector creates based on the template target table.
This property applies to streaming mode.
If you select the Enable Merge option, Google BigQuery V2 Connector ignores this property.
Streaming mode is not applicable when you perform a data driven operation.
Rows per Streaming Request
Specifies the number of rows that Google BigQuery V2 Connector streams to the BigQuery target for each request.
Default is 500 rows.
The maximum row size that Google BigQuery V2 Connector can stream to the Google BigQuery target for each request is 10 MB.
This property applies to streaming mode.
Streaming mode is not applicable when you perform a data driven operation.
Staging file name
Name of the staging file that Google BigQuery V2 Connector creates in the Google Cloud Storage before it loads the data to the Google BigQuery target.
This property applies to bulk mode.
Data Format of the staging file
Specifies the data format of the staging file. You can select one of the following data formats:
  • - Avro
  • - JSON (Newline Delimited). Supports flat and record data with nested and repeated fields.
  • - Parquet
  • - CSV. Supports flat data.
  • Note: In a .csv file, columns of the Timestamp data type are represented as floating point numbers that cause the milliseconds value to differ.
This property applies to bulk and CDC mode.
Avro and parquet format is not applicable when you perform a data driven operation.
Persist Staging File After Loading
Indicates whether Google BigQuery V2 Connector must persist the staging file in the Google Cloud Storage after it writes the data to the Google BigQuery target. You can persist the staging file if you want to archive the data for future reference.
By default, Google BigQuery V2 Connector deletes the staging file in Google Cloud Storage.
This property applies to bulk mode.
Enable Staging File Compression
Select this option to compress the size of the staging file before Google BigQuery writes the data to the Google Cloud Storage and decompress the staging file before it loads the data to the Google BigQuery target.
You can enable staging file compression to reduce cost and transfer time.
Job Poll Interval in Seconds
The number of seconds after which Google BigQuery V2 Connector polls the status of the write job operation.
Default is 10.
Number of Threads for Uploading Staging file
The number of files that Google BigQuery V2 Connector must create to upload the staging file in bulk mode.
Local Stage File Directory
Specifies the directory on your local machine where Google BigQuery V2 Connector stores the files temporarily before writing the data to the staging file in Google Cloud Storage.
This property applies to bulk mode.
Allow Quoted Newlines
Indicates whether Google BigQuery V2 Connector must allow the quoted data sections with newline character in a .csv file.
Field Delimiter
Indicates whether Google BigQuery V2 Connector must allow field separators for the fields in a .csv file.
Quote Character
Specifies the quote character to skip when you write data to Google BigQuery. When you write data to Google BigQuery and the source table contains the specified quote character, the task fails. Change the quote character value to a value that does not exist in the source table.
Default is double quotes.
Allow Jagged Rows
Indicates whether Google BigQuery V2 Connector must accept the rows without trailing columns in a .csv file.
Pre SQL
SQL statement that you want to run before writing data to the target.
For example, if you want to select records from the database before you write the records into the table, specify the following pre-SQL statement:
SELECT * FROM 'api-project-80697026669.EMPLOYEE.RegionNation' LIMIT 1000
Pre SQL Configuration
Specify a pre-SQL configuration.
For example,
DestinationTable:PRESQL_TGT2,DestinationDataset:EMPLOYEE,
FlattenResults:False,WriteDisposition:WRITE_TRUNCATE,UseLegacySql:False
Post SQL
SQL statement that you want to run after writing the data into the target.
For example, if you want to update records in a table after you write the records into the target table, specify the following post-SQL statement:
UPDATE [api-project-80697026669.EMPLOYEE.PERSONS_TGT_DEL]
SET phoneNumber.number =1000011, phoneNumber.areaCode=100 where fullname='John Doe'
Suppress post SQL on Error
Indicates whether the Secure Agent must abort the post-SQL query execution in case the task fails to write data to the Google BigQuery target table due to errors.
Default is not selected.
Post SQL Configuration
Specify a post-SQL configuration.
For example,
DestinationTable:POSTSQL_SRC,DestinationDataset:EMPLOYEE,
FlattenResults:True,UseLegacySQL:False
Truncate target table
Truncates the Google BigQuery target table before loading data to the target.
Default is not selected.
Allow Duplicate Inserts
Applicable when the selected data format for the staging file is CSV when the mapping contains both connected and unconnected ports.
Includes the values from the connected ports and considers the default column values from the unconnected ports from the staging file while loading to the target. If the ports are unconnected, the agent considers the default value only if you have defined the default constraint value in the Google BigQuery target table.
When you do not enable the Use Default Column Values option, the agent populates the target with values from the connected ports.
This property doesn't apply when you create a new target at runtime. Also, when the selected write disposition selected is Write Truncate, the agent removes the field from the target table that is not part of the schema and removes the default constraint for the target column.
Disable Duplicate Update Rows
Determines if multiple incoming rows attempt to update the same target row, the Secure Agent must process only one of the incoming rows and ignore the rest of the incoming rows.
Select this option to configure the mapping to process only one of the incoming rows and ignore the rest of the incoming rows.
Default is not selected.
Forward Rejected Rows
Applicable only when you configure DD_REJECT constant in the data driven condition to reject all the rows.
Otherwise, this property is not applicable for Google BigQuery V2 Connector.
Billing Project ID
The project ID for the Google Cloud project that is linked to an active Google Cloud Billing account where the Secure Agent runs the query.
If you do not specify a value for the Billing Project ID property, the Secure Agent runs the query in the Google Cloud project that contains the Google BigQuery objects based on the Project ID value specified in the Google BigQuery V2 connection.