Connections > Google BigQuery V2 connection properties > Connect to Google BigQuery
  

Connect to Google BigQuery

Let's configure the Google BigQuery V2 connection properties to connect to Google BigQuery.

Before you begin

Before you configure a connection, ensure that you download the Google service account key file in JSON format. The service account key file is created when you create a Google service account.
You require the client email, private key, and project ID from the service account key JSON file to create a Google BigQuery connection.
The following video shows you how to get the information you need from your Google BigQuery account:
https://infa.media/3e4LzdW

Connection details

The following table describes the basic connection properties:
Property
Description
Connection Name
Name of the connection.
Each connection name must be unique within the organization. Connection names can contain alphanumeric characters, spaces, and the following special characters: _ . + -,
Maximum length is 255 characters.
Description
Description of the connection. Maximum length is 4000 characters.
Type
Google BigQuery V2
Use Secret Vault
Stores sensitive credentials for this connection in the secrets manager that is configured for your organization.
This property appears only if secrets manager is set up for your organization.
This property is not supported by Mass Ingestion.
When you enable the secret vault in the connection, you can select which credentials that the Secure Agent retrieves from the secrets manager. If you don't enable this option, the credentials are stored in the repository or on a local Secure Agent, depending on how your organization is configured.
For information about how to configure and use a secrets manager, see Secrets manager configuration.
Runtime Environment
The name of the runtime environment where you want to run tasks.
Select a Secure agent, Hosted Agent, or serverless runtime environment.
You cannot run an application ingestion, database ingestion, or streaming ingestion task on a Hosted Agent or serverless runtime environment.
Service Account Email
The client_email value from the Google service account key JSON file.
Service Account Key
The private_key value from the Google service account key JSON file.
Project ID
The project_id value from the Google service account key JSON file.
If you have created multiple projects with the same service account, enter the ID of the project that contains the dataset that you want to connect to.
Note: If you want to validate the credentials for the Service Account Email, Service Account Key, and Project ID during a test connection, set the flag CredentialValidation:true in the Provide Optional Properties field in advanced settings.

Advanced settings

The following table describes the advanced connection properties:
Property
Description
Enable BigQuery Storage API
Select this option to use Google BigQuery Storage to stage the files when you read or write data.
Default is unselected.
Storage Path
Path in Google Cloud Storage where the agent creates a local stage file to store the data temporarily. The agent uses this storage when it reads data in staging mode or writes data in bulk mode.
Use one of the following formats:
  • - gs://<bucket_name>
  • - gs://<bucket_name>/<folder_name>
When you enable cross-region replication in Google BigQuery, enter a Google Cloud Storage path that supports multi-region storage.
This property is not applicable if you use Google BigQuery Storage to stage the files.
Connection Mode
The mode that you want to use to read data from or write data to Google BigQuery.
Select one of the following connection modes:
  • - Simple. Flattens each field within the Record data type field as a separate field in the mapping.
  • - Hybrid¹. Displays all the top-level fields in the Google BigQuery table including Record data type fields. Google BigQuery V2 Connector displays the top-level Record data type field as a single field of the String data type in the mapping.
  • - Complex¹. Displays all the columns in the Google BigQuery table as a single field of the String data type in the mapping.
Default is Simple.
This property is applicable if you use Google Cloud Storage to stage the files.
Use Legacy SQL for Custom Query¹
Select this option to use legacy SQL to define a custom query. If you clear this option, use standard SQL to define a custom query.
This property is applicable if you use Google Cloud Storage to stage the files.
This property doesn't apply if you configure the Google BigQuery V2 connection in hybrid or complex mode.
Dataset Name for Custom Query¹
When you define a custom query, specify a Google BigQuery dataset.
Schema Definition File Path¹
Directory on the Secure Agent machine where the Secure Agent creates a JSON file with the sample schema of the Google BigQuery table. The JSON file name is the same as the Google BigQuery table name.
Alternatively, you can specify a storage path in Google Cloud Storage where the Secure Agent creates a JSON file with the sample schema of the Google BigQuery table. You can download the JSON file from the specified storage path in Google Cloud Storage to a local machine.
The schema definition file is required if you configure complex connection mode in the following scenarios:
  • - You add a Hierarchy Builder transformation in a mapping to read data from relational sources and write data to a Google BigQuery target.
  • - You add a Hierarchy Parser transformation in a mapping to read data from a Google BigQuery source and write data to relational targets.
When you use a serverless runtime environment, specify a storage path in Google Cloud Storage.
This property is applicable if you use Google Cloud Storage to stage the files.
Region ID
The region name where the Google BigQuery dataset that you want to access resides.
Note: Ensure that you specify a bucket name or the bucket name and folder name in the Storage Path property that resides in the specified region.
For more information about the regions supported by Google BigQuery, see Dataset locations.
Staging Dataset¹
The Google BigQuery dataset name where you want to create the staging table to stage the data. You can define a Google BigQuery dataset that is different from the source or target dataset.
This property is applicable if you use Google Cloud Storage to stage the files.
Provide Optional Properties¹
Comma-separated key-value pairs of custom properties in the Google BigQuery V2 connection to configure certain source and target functionalities.
For more information about the list of custom properties that you can specify, see Optional Properties configuration Knowledge Base.
Enable Retry¹
Select this option if you want the Secure Agent to attempt a retry to receive the response from the Google BigQuery endpoint.
You can configure the retry strategy to read data from Google BigQuery in direct or staging mode and write data to Google BigQuery in bulk mode.
The retry strategy is not applicable in the CDC and streaming modes when you write data to a Google BigQuery target.
The connection retry option also applies to a connection configured to use the proxy server to connect to the endpoint.
Default is unselected.
Maximum Retry Attempts
Appears only if you select the Enable Retry property.
The maximum number of retry attempts that the Secure Agent performs to receive the response from the Google BigQuery endpoint.
If the Secure Agent fails to connect to Google BigQuery within the maximum retry attempts, the connection fails.
Default is 6 attempts.
Initial Retry Delay
Appears only if you select the Enable Retry property.
The initial wait time in seconds before the Secure Agent attempts to retry the connection.
Default is 1 second.
Retry Delay Multiplier
Appears only if you select the Enable Retry property.
The multiplier that the Secure Agent uses to exponentially increase the wait time between successive retry attempts up to the maximum retry delay time.
Default multiplier is 2.0. You can also use fractional values.
Maximum Retry Delay
Appears only if you select the Enable Retry property.
The maximum wait time in seconds that the Secure Agent waits between successive retry attempts.
Default is 32 seconds.
Total Timeout
Appears only if you select the Enable Retry property.
The total time duration in seconds that the Secure Agent attempts to retry the connection after which the connection fails.
Default is 50 seconds.
¹ Doesn't apply to mappings in advanced mode.

Related links