Google BigQuery
You can use the Google BigQuery resource to collect metadata from the assets in Google BigQuery.
Objects Extracted
The Google BigQuery resource extracts metadata from the following assets in a Google BigQuery data source:
- •Project
- •Dataset
- •Table
- •View
Permissions to Configure the Resource
Make sure that you perform the following steps before you configure the Google BigQuery resource:
- •Assign the dataViewer role to the service account that you use to access the Google Cloud Platform project.
- •Configure bigquery.tables.list and the bigquery.jobs.get permissions for the service account that you use to access the Google Cloud Platform project.
Connect to a Google BigQuery Data Source Enabled for SSL
To connect to a Google BigQuery data source enabled for SSL, perform the following steps:
- 1. Download the Google BigQuery SSL certificates using a web browser.
Note: Make sure that you import the Google Trust Services certificate in the certification path.
- 2. Copy the certificates to the <INFA_HOME>/services/shared/security/ directory.
- 3. Go to the <INFA_HOME>/source/java/jre/bin directory and run the following keytool command to import each copied certificate as a trusted certificate in the Informatica domain keystore:
keytool -import -file <INFA_HOME>/services/shared/security/<certificate>.cer -alias <alias name> -keystore <INFA_HOME>/services/shared/security/infa_truststore.jks -storepass <Informatica domain keystore password>
Resource Connection Properties
The General tab includes the following properties:
Property | Description |
---|
Project ID | Name of the Google Cloud Platform project that you want to access. |
Private Key | The private key associated with the service account. |
Client Email | The client email address associated with the service account. |
The Metadata Load Settings tab includes the following properties:
Property | Description |
---|
Enable Source Metadata | Extracts metadata from the data source. |
Scan Hidden Datasets | Extracts metadata from hidden and anonymous datasets. |
Dataset | Select the datasets that you want to use to import metadata from Google BigQuery tables in the project. Default is all datasets. |
Source Metadata Filter | You can include or exclude tables and views from the resource run. Use semicolons (;) to separate the table names and view names. For more information about the filter field, see Source Metadata and Data Profile Filter. |
Case Sensitive | Specifies that the resource is configured for case sensitivity. Select one of the following values: - - True. Select this check box to specify that the resource is configured as case sensitive.
- - False. Clear this check box to specify that the resource is configured as case insensitive.
The default value is True. |
Memory | Specifies the memory required to run the scanner job. Select one of the following values based on the data set size that you plan to import into the catalog: Note: For more information about the memory values, see the Tuning Enterprise Data Catalog Performance article on How To-Library Articles tab in the Informatica Doc Portal |
JVM Options | JVM parameters that you can set to configure scanner container. Use the following arguments to configure the parameters: - - -Dscannerloglevel=<DEBUG/INFO/ERROR>. Changes the log level of scanner to values, such as DEBUG, ERROR, or INFO. Default value is INFO.
- - -Dscanner.container.core=<No. of core>. Increases the core for the scanner container. The value should be a number.
- - -Dscanner.yarn.app.environment=<key=value>. The key-value pair that you need to set in the Yarn environment. Use a comma to separate the key pair value.
- - -Dscanner.pmem.enabled.container.memory.jvm.memory.ratio=<1.0/2.0>. Increases the scanner container memory when pmem is enabled. The default value is 1.
|
Track Data Source Changes | View metadata source change notifications in Enterprise Data Catalog. |
Note: Following are the list of features that the Google BigQuery resource does not support:
- •During profiling, the Geography, and Byte data type columns are skipped.
- •The Google BigQuery resource does not read multiple records when you use the legacy SQL query.
- •Google BigQuery does not support a table partitioned by a field.