Google BigQuery

Information	Description
Name	The name of the resource.
Description	The description of the resource.
Resource type	The type of the resource.
Execute On	You can choose to execute on the default catalog server or offline.

Property	Description
Project ID	Name of the Google Cloud Platform project that you want to access.
Private Key	The private key associated with the service account.
Client Email	The client email address associated with the service account.
Connect through a proxy server	Proxy server to connect to the data source. Default is Disabled.
Proxy Host	Host name or IP address of the proxy server.
Proxy Port	Port number of the proxy server.
Proxy User Name	Required for authenticated proxy. Authenticated user name to connect to the proxy server.
Proxy Password	Required for authenticated proxy. Password for the authenticated user name.

Property	Description
Enable Source Metadata	Extracts metadata from the data source.
Scan Hidden Datasets	Extracts metadata from hidden and anonymous datasets.
Dataset	Select the datasets that you want to use to import metadata from Google BigQuery tables in the project. Default is all datasets.
Source Metadata Filter	You can include or exclude tables and views from the resource run. Use semicolons (;) to separate the table names and view names. For more information about the filter field, see Source Metadata and Data Profile Filter.
Case Sensitive	Specifies that the resource is configured for case sensitivity. Select one of the following values: - True. Select this check box to specify that the resource is configured as case sensitive. - False. Clear this check box to specify that the resource is configured as case insensitive. The default value is True.
Memory	Specifies the memory required to run the scanner job. Select one of the following values based on the data set size that you plan to import into the catalog: - Low - Medium - High Note: For more information about the memory values, see the Tuning Enterprise Data Catalog Performance article on How To-Library Articles tab in the Informatica Doc Portal
Custom Options	JVM parameters that you can set to configure scanner container. Use the following arguments to configure the parameters: - -Dscannerloglevel=<DEBUG/INFO/ERROR>. Changes the log level of scanner to values, such as DEBUG, ERROR, or INFO. Default value is INFO. - -Dscanner.container.core=<No. of core>. Increases the core for the scanner container. The value should be a number. - -Dscanner.yarn.app.environment=<key=value>. The key-value pair that you need to set in the Yarn environment. Use a comma to separate the key pair value. - -Dscanner.pmem.enabled.container.memory.jvm.memory.ratio=<1.0/2.0>. Increases the scanner container memory when pmem is enabled. The default value is 1.
Track Data Source Changes	View metadata source change notifications in Enterprise Data Catalog.

Identity and Access Management (IAM) Authentication for Google BigQuery

Sampling Options

You should use the hybrid connection mode for profiling and extraction of metadata from the Google BigQuery data source. The hybrid connection mode uses standard SQL language to query Google BigQuery. Google BigQuery suggests to use standard SQL language over legacy SQL language. The simple connection mode uses legacy SQL language and must be avoided. Views that are standard SQL compliant are also supported in the hybrid connection mode.

Sampling Option	Description	Pushdown to Google BigQuery
First N rows	Runs the profile on N number of rows in the resource. In the Number of First N Sampling Rows field that appears, enter the number of rows to run the profile on. Note: For optimal performance, consider using this sampling option.	Yes Sample pushdown query: LIMIT <N>
Random Percentage	Runs the profile on a percentage of data blocks in the data object. In the Random Percentage field that appears, enter the number of data blocks to run the profile on. Google BigQuery tables are organized into data blocks. The TABLESAMPLE clause works by randomly selecting a percentage of data blocks from the table and reading all of the rows in the selected blocks. The sampling granularity is limited by the number of data blocks. Google BigQuery charges you for reading the data that is sampled. Google BigQuery does not cache the results of a query that includes a TABLESAMPLE clause, so each execution incurs the cost of reading the data from storage. For more information on table sampling in Google BigQuery, see Table Sampling in Google BigQuery.	Yes Sample pushdown query: TABLESAMPLE SYSTEM (<N> PERCENT)
Limit N rows	Runs the profile based on the number of rows in the data object. In the Number of Rows to Limit field that appears, enter the number of rows to run the profile on. The Data Integration Service reads all rows and then runs a sampling algorithm.	No
Auto Random rows	Runs the profile on a random sample of rows. Enterprise Data Catalog computes the number of random rows based on the number of source rows. The Data Integration Service reads all rows and then runs a sampling algorithm.	No
Random N rows	Runs the profile on the configured number of random rows. In the Random Sampling Rows field that appears, enter the number of rows that you want to run the profile on. The Data Integration Service reads all rows and then runs a sampling algorithm.	No
All rows	Runs the profile on all the rows in the data source.	No

Google BigQuery

Objects Extracted

Permissions to Configure the Resource

Connect to a Google BigQuery Data Source Enabled for SSL

Basic Information

Resource Connection Properties

Identity and Access Management (IAM) Authentication for Google BigQuery

Sampling Options