Google BigQuery V2 Connector > Mappings for Google BigQuery V2 > Google BigQuery V2 sources in mappings
  

Google BigQuery V2 sources in mappings

To read data from Google BigQuery, configure a Google BigQuery object as the Source transformation in a mapping.
Specify the name and description of Google BigQuery source. Configure the source and advanced properties for the source object in mappings.
The following table describes the source properties that you can configure for a Google BigQuery source:
Property
Description
Connection
Name of the Google BigQuery V2 source connection. Select a source connection, or click New Parameter to define a new parameter for the source connection.
If you want to overwrite the parameter at runtime, select the Allow parameter to be overridden at run time option when you create a parameter.
Source Type
Type of the Google BigQuery source object.
Select Single Object, Multiple Objects, Query or Parameter.
When you select single object as the source type, you can choose a table or view. For the other source object types, you can choose a table.
Object
Name of the Google BigQuery source object based on the source type selected.
Parameter
A parameter file where you define values that you want to update without having to edit the task.
Select an existing parameter for the source object or click New Parameter to define a new parameter for the source object. The Parameter property appears only if you select Parameter as the source type.
If you want to overwrite the parameter at runtime, select the Allow parameter to be overridden at run time option when you create a parameter. When the task runs, the agent uses the parameters from the file that you specify in the task advanced session properties.
Query
Click on Define Query and enter a valid custom query.
The Query property appears only if you select Query as the source type.
You can parameterize a custom query object at runtime in a mapping.
Select the source advanced property Use EXPORT DATA Statement to stage to use the ORDER BY clause in a custom query in staging mode.
Filter
Configure a simple filter or an advanced filter to remove rows at the source. You can improve efficiency by filtering early in the data flow.
A simple filter includes a field name, operator, and value. Use an advanced filter to define a more complex filter condition, which can include multiple conditions using the AND or OR logical operators.
The following table describes the advanced properties that you can configure for a Google BigQuery source:
Property
Description
Source Dataset ID
Optional. Overrides the Google BigQuery dataset name that you specified in the Source transformation.
Source Table Name
Optional. Overrides the Google BigQuery table name that you specified in the Source transformation.
Source Staging Dataset
Optional. Overrides the Google BigQuery staging dataset name that you specified in the connection and the Source Dataset ID source advanced property.
Number of Rows to Read
Specifies the number of rows to read from the Google BigQuery source table.
Allow Large Results
Determines whether Google BigQuery V2 Connector must produce arbitrarily large result tables to query large source tables.
If you select this option, you must specify a destination table to store the query results.
Query Results Table Name
Required if you select the Allow Large Results option.
Specifies the destination table name to store the query results. If the table is not present in the dataset, Google BigQuery V2 Connector creates the destination table with the name that you specify.
Job Poll Interval In Seconds
The number of seconds after which Google BigQuery V2 Connector polls the status of the read job operation.
Default is 10.
Read Mode
Specifies the read mode to read data from the Google BigQuery source.
You can select one the following read modes:
  • - Direct. In direct mode, Google BigQuery V2 Connector reads data directly from the Google BigQuery source table.
  • Note: When you use hybrid and complex connection mode, you cannot use direct mode to read data from the Google BigQuery source.
  • - Staging. In staging mode, Google BigQuery V2 Connector exports data from the Google BigQuery source into Google Cloud Storage. After the export is complete, Google BigQuery V2 Connector downloads the data from Google Cloud Storage into the local stage file and then reads data from the local stage file.
Default is Direct mode.
Use EXPORT DATA statement to stage
Uses the EXPORT DATA statement to export data from Google BigQuery to Google Cloud Storage.
If the query contains an ORDER BY clause, the specified order is maintained when you export the data.
This property applies to staging mode.
Number of Threads for Downloading Staging Files
Specifies the number of files that Google BigQuery V2 Connector downloads at a time to enable parallel download.
This property applies to staging mode.
Data format of the staging file
Specifies the data format of the staging file. You can select one of the following data formats:
  • - Avro
  • - JSON (Newline Delimited). Supports flat and record data with nested and repeated fields.
  • - CSV. Supports flat data.
  • Note: In a .csv file, columns of the Timestamp data type are represented as floating point numbers that cause the milliseconds value to differ.
  • - Parquet
This property applies to staging mode.
Local Stage File Directory
Specifies the directory on your local machine where Google BigQuery V2 Connector stores the Google BigQuery source data temporarily before it reads the data.
This property applies to staging mode.
Staging File Name
Name of the staging file where data from the Google BigQuery source table is exported to Google Cloud Storage.
This property applies to staging mode.
Enable Staging File Compression
Indicates whether to compress the size of the staging file in Google Cloud Storage before Google BigQuery V2 Connector reads data from the staging file.
You can enable staging file compression to reduce cost and transfer time.
This property applies to staging mode.
Persist Extract Staging File After Download
Indicates whether Google BigQuery V2 Connector must persist the staging file after it reads data from the staging file.
By default, Google BigQuery V2 Connector deletes the staging file.
Persist Destination Table
Indicates whether Google BigQuery V2 Connector must persist the query results table after it reads data from the query results table.
By default, Google BigQuery V2 Connector deletes the query results table.
pre SQL
SQL statement that you want to run before reading data from the source.
For example, if you want to select records in the database before you read the records from the table, specify the following pre SQL statement:
SELECT * FROM [api-project-80697026669:EMPLOYEE.DEPARTMENT] LIMIT 1000;
post SQL
SQL statement that you want to run after reading data from the source.
For example, if you want to update records in a table after you read the records from a source table, specify the following post SQL statement:
UPDATE [api-project-80697026669.EMPLOYEE.PERSONS_TGT_DEL]
SET phoneNumber.number=1000011, phoneNumber.areaCode=100 where fullname='John Doe'
pre SQL Configuration
Specify a pre SQL configuration.
For example,
DestinationTable:PRESQL_SRC,DestinationDataset:EMPLOYEE,
FlattenResults:False,WriteDisposition:WRITE_TRUNCATE,UseLegacySql:False
post SQL Configuration
Specify a post SQL configuration.
For example,
DestinationTable:POSTSQL_SRC,DestinationDataset:EMPLOYEE,
FlattenResults:True,WriteDisposition:WRITE_TRUNCATE,UseLegacySql:False
SQL Override Query
Overrides the default SQL query used to read data from the Google BigQuery source.
Note: When you specify SQL override query, you must specify a dataset name in the Source Dataset ID advanced source property.
Ensure that the list of selected columns, data types, and the order of the columns that appear in the query matches the columns, data types, and order in which they appear in the source object.
Ensure that you only map all the columns in the SQL override query to the target.
Select the source advanced property Use EXPORT DATA Statement to stage to use the ORDER BY clause in a SQL Override Query in staging mode. When staging optimization is enabled in a mapping, the columns mapped in the SQL Override Query must match the columns in the source object.
Use Legacy SQL For SQL Override
Indicates that the SQL Override query is specified in legacy SQL.
Use the following format to specify a legacy SQL query for the SQL Override Query property:
SELECT <Col1, Col2, Col3> FROM [projectID:datasetID.tableName]
Clear this option to define a standard SQL override query.
Use the following format to specify a standard SQL query for the SQL Override Query property:
SELECT * FROM `projectID.datasetID.tableName`
Label
You can assign a label for the transformation to organize and filter the associated jobs in the Google Cloud Platform Log Explorer.
For more information about labels and their usage requirements, see Assign a label to the transformations.
Billing Project ID
The project ID for the Google Cloud project that is linked to an active Google Cloud Billing account where the Secure Agent runs query and extract jobs.
If you omit the project ID here, the Secure Agent runs query and extract jobs in the Google Cloud project corresponding to the Project ID value specified in the Google BigQuery V2 connection.
Retry Options
Comma-separated list to specify the following retry options if you set Direct as the read mode:
  • - Retry Count. The number of retry attempts to read data from Google BigQuery.
  • - Retry Interval. The time in seconds to wait between each retry attempt.
  • - Retry Exceptions. The list of exceptions separated by pipe (|) character for which the retries are made.
Use the following format to specify the retry options:
For example,
RetryCount:5,RetryInterval:1,RetryExceptions:java.net.ConnectException|java.io.IOException
Note: The retry options are available for preview. Preview functionality is supported for evaluation purposes but is unwarranted and is not production-ready. Informatica recommends that you use in non-production environments only. Informatica intends to include the preview functionality in an upcoming release for production use, but might choose not to in accordance with changing market or technical circumstances. For more information, contact Informatica Global Customer Support. To use the functionality, your organization must have the appropriate licenses.
You can set the tracing level in the advanced properties session to determine the amount of details that logs contain.
The following table describes the tracing levels that you can configure:
Property
Description
Terse
The Secure Agent logs initialization information, error messages, and notification of rejected data.
Normal
The Secure Agent logs initialization and status information, errors encountered, and skipped rows due to transformation row errors. Summarizes session results, but not at the level of individual rows.
Verbose Initialization
In addition to normal tracing, the Secure Agent logs additional initialization details, names of index and data files used, and detailed transformation statistics.
Verbose Data
In addition to verbose initialization tracing, the Secure Agent logs each row that passes into the mapping. Also notes where the Secure Agent truncates string data to fit the precision of a column and provides detailed transformation statistics.
When you configure the tracing level to verbose data, the Secure Agent writes row data for all rows in a block when it processes a transformation.

Read modes

When you use Google BigQuery V2 Connector, you can read data by using direct mode or staging mode. Before you choose a mode, see the Google documentation to understand the cost implications and trade-offs for each mode.
You can read data from a Google BigQuery source by using one of the following modes:
Direct Mode
Use direct mode when the volume of data that you want to read is small. In direct mode, Google BigQuery V2 Connector directly reads data from a Google BigQuery source. You can configure the number of rows that you want Google BigQuery V2 Connector to read in one request.
Staging Mode
Use staging mode when you want to read large volumes of data in a cost-efficient manner.
In staging mode, Google BigQuery V2 Connector first exports the data from the Google BigQuery source into Google Cloud Storage. After the export is complete, Google BigQuery V2 Connector downloads the data from Google Cloud Storage into a local stage file. You can configure the local stage file directory in the advanced source properties. Google BigQuery V2 Connector then reads the data from the local stage file.
When you enable staging file compression, Google BigQuery V2 Connector compresses the size of the staging file in Google Cloud Storage. Google BigQuery V2 Connector then downloads the staging file and decompresses the staging file before it reads the file. To improve the performance and download data in parallel, you can configure the number of threads for downloading the staging file.
If a job fails, Google BigQuery V2 Connector deletes the staging file unless you configure the task or mapping to persist the staging file.

Optimize read performance in staging mode

You can configure Data Integration to create a flat file for staging when you read data from a Google BigQuery source to optimize the staging performance.
You can enhance the read operation performance by setting a staging property, INFA_DTM_RDR_STAGING_ENABLED_CONNECTORS, for the Secure Agent. Data Integration first copies the data from Google BigQuery source into a flat file located in the local staging file directory. When the staging file contains all the data, Data Integration reads the data.
You can optimize the staging performance when you read data from single or multiple Google BigQuery objects.

Enabling Google BigQuery V2 Connector to optimize the read performance

Perform the following tasks to set the staging property:
  1. 1In Administrator, click Runtime Environments. The Runtime Environments page appears.
  2. 2Select the Secure Agent for which you want to set the custom configuration property.
  3. 3Click Edit Secure Agent icon corresponding to the Secure Agent you want to edit in Actions. The Edit Secure Agent page appears.
  4. 4In the System Configuration Details section, select the Service as Data Integration Server and the type as Tomcat.
  5. 5Set the value of the Tomcat property INFA_DTM_RDR_STAGING_ENABLED_CONNECTORS to the plugin ID of the Google BigQuery V2 Connector.
  6. You can find the plugin ID in the manifest file located in the following directory:
    <Secure Agent installation directory>/<GoogleBigQueryV2 package>/CCIManifest
  7. 6Click Save.
  8. 7Restart the Secure Agent.
  9. 8In the Google BigQuery V2 connection, set the UseRFC4180CSVParser:true custom property in the Provide Optional Properties connection property.
You can check the session logs. If the flat file is created successfully, Data Integration logs the following message in the session log:
The reader is configured to run in [DTM_STAGING_CSV] mode.
In the Google BigQuery advanced source properties, set the read mode as staging and set the Data Format of the staging file property to CSV.
When you enable the staging mode to read source data, you can see the following message in the logs:
READER_1_1_1> SDKS_38636 [2022-07-26 14:59:29.056] Plug-in #601601: DTM Staging is enabled for connector for Source Instance [Source].
When you disable the staging mode to read source data, you can see the following message in the logs:
READER_1_1_1> SDKS_38637 [2022-07-26 16:46:04.312] Plug-in #601601: DTM Staging is disabled for connector for Source Instance [Source].

Rules and guidelines when you optimize the read performance

Consider the following rules when you enable the staging property:

Custom query source type

You can use a custom query as a source object when you use a Google BigQuery V2 connection.
You might want to use a custom query as the source when a source object is large. You can use the custom query to reduce the number of fields that enter the data flow. You can configure a custom query to read data from one project while having the Google BigQuery V2 connection set up in another project. You can also create a parameter for the source type when you design your mapping so that you can define the query in the Mapping Task wizard.
To use a custom query as a source, select Query as the source type when you configure the source transformation and then use valid and supported SQL to define the query.
You can use legacy SQL or standard SQL to define a custom query. To define a legacy SQL custom query, you must select the Use Legacy SQL For Custom Query option when you create a Google BigQuery V2 connection. You can unselect the Use Legacy SQL For Custom Query option to define a standard SQL custom query. For more information about Google BigQuery Legacy SQL functions and operators, see Legacy SQL functions and operators.

Rules and guidelines for Google BigQuery custom queries

When you configure a custom query, consider the following guidelines:

Adding multiple source objects

When you create a Source transformation, you can select Google BigQuery V2 multiple object as the source type and then configure a join to combine the tables. You can define an advanced relationship or a query to join the tables. You must use the standard SQL to define the query to join the tables.
    1In the Source transformation, click the Source Type as Multiple Objects.
    2From the Actions menu, click Add Source Object.
    3Select the source object that you want to add from the displayed list and click OK.
    4From the Related Objects Actions menu, select Advanced Relationship.
    5In the Advanced Relationship window, you can click Add Object to add more objects.
    6Set your own conditions or specify a query to define the relationship between the tables.
    Note: When you configure a join expression, select the fields and define a join condition or a query syntax. You must prefix the Project ID before the Dataset ID in the join condition to combine multiple tables.
    Use the following example to configure the join condition or join query with the Project ID prefix:
    In the example, P represents the Project ID, D represents the Dataset ID, and T represents the Table Name.
    Note: If you configure a filter, prefix the Project ID before the Dataset ID in the filter condition. For example, provide the simple filter condition as,
    `Project.Dataset.Table`.column
    Note: If you specify a SQL override query for multiple source tables, you must use the following format for the SQL override query:
    select `project_id.dataset.table`.col1 AL_dataset_table_col1,`project_id.dataset.table`.col2 AL_dataset_table_col2, `project_id.dataset1.table1`.col1 AL_dataset1_table1_col1,`project_id.dataset1.table1`.col2 AL_dataset1_table1_col2 from `project_id.dataset.table` <join condition> `project_id.dataset1.table1` ON `project_id.dataset.table`.col = `project_id.dataset1.table1`.col1 where <condition>
    In the example, AL represents the alias prefix for the column names.
    7Click OK.
    The following image shows an example of an advanced join condition defined between the Google BigQuery V2 tables:
    The image shows an advanced join condition configured between multiple objects.

Rules and guidelines for adding multiple source objects

Consider the following rules and guidelines when you add multiple source objects: