Catalog Source Configuration > Google BigQuery > Before you begin

Before you begin

Before you create a catalog source, ensure that you have the information required to connect to the source system.

Perform the following tasks:

•Assign the required permissions.
•Configure permissions to the service account that you use to access the Google Cloud Platform project.
•Configure a connection to the Google BigQuery source system in Administrator.
•Create endpoint catalog sources for connection assignment.
•Optionally, if you want to identify pairs of similar columns and relationships between tables within a catalog source, import a relationship inference model.

Verify permissions

To extract metadata and to configure other capabilities that a catalog source might include, you need account access and permissions on the source system. The permissions required might vary depending on the capability.

Permissions to extract metadata

Ensure that you have the required permissions to enable metadata extraction.

To assign permissions to extract metadata, choose one of the following role options:

•To use existing roles, assign the BigQuery Data Viewer or the BigQuery Metadata Viewer roles to the service account that you use to access the Google Cloud Platform project.

Note: If you assign the BigQuery Data Viewer role, grant the bigquery.jobs.create permission. If you assign the BigQuery Metadata Viewer role, grant the bigquery.jobs.create and bigquery.tables.getData permissions.

•To use minimal permissions, create a custom role with the following permissions and assign the custom role to the service account that you use to access the Google Cloud Platform project:

- resourcemanager.projects.get
- bigquery.datasets.get
- bigquery.routines.get
- bigquery.routines.list
- bigquery.tables.get
- bigquery.tables.list
- bigquery.tables.getData
- bigquery.jobs.create

The bigquery.tables.getData permission is needed to query the __TABLES__ table from a dataset to get information such as description, ID, and last modified date. The bigquery.jobs.create permission is needed to run queries on the dataset.

Permissions to run data profiles

Ensure that you have the required permissions to run profiles.

Grant the following permissions:

•storage.objects.get. Required to read objects from the Google BigQuery source system.
•storage.objects.create. Required to create staging files in the Google Cloud Storage bucket.

Permissions to perform data classification

You can perform data classification with the permissions required to perform metadata extraction.

Permissions to perform relationship discovery

You can perform relationship discovery with the permissions required to perform metadata extraction.

Permissions to perform glossary association

You can perform glossary association with the permissions required to perform metadata extraction.

Create a connection

Create a Google BigQuery connection object in Administrator with the connection details of the Google BigQuery source system.

1In Administrator, select Connections.

2Click New Connection.

3In the Connection Details section, enter the following connection details:

Connection property	Description
Connection Name	Name of the connection. Each connection name must be unique within the organization. Connection names can contain alphanumeric characters, spaces, and the following special characters: _ . + -, Maximum length is 255 characters.
Description	Description of the connection. Maximum length is 4000 characters.

4Select the Google BigQuery V2 connection type.

5Enter properties specific to the Google BigQuery connection:

Property	Description
Runtime Environment	The execution platform that runs tasks. The runtime environment is either a Secure Agent or a serverless runtime environment.
Authentication Type	Select the Service Account authentication type to access Google BigQuery and configure the authentication-specific parameters.
Service Account Email	The client_email value from the Google service account key JSON file.
Service Account Key	The private_key value from the Google service account key JSON file.
Project ID	The project_id value from the Google service account key JSON file. If you have created multiple projects with the same service account, enter the ID of the project that contains the dataset that you want to connect to.
Connection mode	The mode that you want to use to read data from or write data to Google BigQuery. Choose the Hybrid connection mode for optimum data profiling results.
Use Legacy SQL for Custom Query	Select this option to use a legacy SQL to define a custom query. If you clear this option, use a standard SQL to define a custom query. Clear this option for optimum data profiling results.
Retry Strategy Section	Enable or disable retry. When you read data from Google BigQuery in staging mode, you can configure the retry strategy when the Google BigQuery V2 connection fails to connect to the Google BigQuery source.

6Click Test Connection.

7Click Save.

Create endpoint catalog sources for connection assignment

An endpoint catalog source represents a source system that the catalog source references. Before you perform connection assignment, create endpoint catalog sources and run the catalog source jobs.

You can then perform connection assignment to reference source systems to view complete lineage with source system objects.

Import a relationship inference model

Import a relationship inference model if you want to configure the relationship discovery capability. You can either import a predefined relationship inference model, or import a model file from your local machine.

1In Metadata Command Center, click Explore on the navigation panel.

2Expand the menu and select Relationship Inference Model. The following image shows the Explore page with the Relationship Inference Model menu: The image shows the Explore page with the Relationship Inference Model menu and the Import Predefined Content options.

The image shows the Explore page with the Relationship Inference Model menu and the Import Predefined Content options.

3Select one of the following options:

- Import Predefined Content. Imports a predefined relationship inference model called Column Similarity Model v1.0.
- Import. Imports the predefined relationship inference model from your local machine. Select this if you previously imported predefined content into your local machine and the inference model is stored on the machine.

To import a file, click Choose File in the Import Relationship Inference Model window and navigate to the model file on your local machine. You can also drag and drop the file.

The imported models appear in the list of relationship inference models on the Relationship Discovery tab.