Before you create a catalog source, ensure that you have the information required to connect to the source system.
Perform the following tasks:
•Assign the required permissions.
•Configure permissions to the service account that you use to access the Google Cloud Platform project.
•Configure a connection to the Google BigQuery source system in Administrator.
•Create endpoint catalog sources for connection assignment.
•Optionally, if you want to identify pairs of similar columns and relationships between tables within a catalog source, import a relationship inference model.
Verify permissions
To extract metadata and to configure other capabilities that a catalog source might include, you need account access and permissions on the source system. The permissions required might vary depending on the capability.
Permissions to extract metadata
Ensure that you have the required permissions to enable metadata extraction.
To assign permissions to extract metadata, choose one of the following role options:
•To use existing roles, assign the BigQuery Data Viewer or the BigQuery Metadata Viewer roles to the service account that you use to access the Google Cloud Platform project.
Note: If you assign the BigQuery Data Viewer role, grant the bigquery.jobs.create permission. If you assign the BigQuery Metadata Viewer role, grant the bigquery.jobs.create and bigquery.tables.getData permissions.
•To use minimal permissions, create a custom role with the following permissions and assign the custom role to the service account that you use to access the Google Cloud Platform project:
- resourcemanager.projects.get
- bigquery.datasets.get
- bigquery.routines.get
- bigquery.routines.list
- bigquery.tables.get
- bigquery.tables.list
- bigquery.tables.getData
- bigquery.jobs.create
The bigquery.tables.getData permission is needed to query the __TABLES__ table from a dataset to get information such as description, ID, and last modified date. The bigquery.jobs.create permission is needed to run queries on the dataset.
Permissions to run data profiles
Ensure that you have the required permissions to run profiles.
Grant the following permissions:
•storage.objects.get. Required to read objects from the Google BigQuery source system.
•storage.objects.create. Required to create staging files in the Google Cloud Storage bucket.
Permissions to perform data classification
You can perform data classification with the permissions required to perform metadata extraction.
Permissions to perform relationship discovery
You can perform relationship discovery with the permissions required to perform metadata extraction.
Permissions to perform glossary association
You can perform glossary association with the permissions required to perform metadata extraction.
Create a connection
Create a Google BigQuery connection object in Administrator with the connection details of the Google BigQuery source system.
1In Administrator, select Connections.
2Click New Connection.
3In the Connection Details section, enter the following connection details:
Connection property
Description
Connection Name
Name of the connection.
Each connection name must be unique within the organization. Connection names can contain alphanumeric characters, spaces, and the following special characters: _ . + -,
Maximum length is 255 characters.
Description
Description of the connection. Maximum length is 4000 characters.
4Select the Google BigQuery V2 connection type.
5Enter properties specific to the Google BigQuery connection:
Property
Description
Runtime Environment
The execution platform that runs tasks. The runtime environment is either a Secure Agent or a serverless runtime environment.
Authentication Type
Select the Service Account authentication type to access Google BigQuery and configure the authentication-specific parameters.
Service Account Email
The client_email value from the Google service account key JSON file.
Service Account Key
The private_key value from the Google service account key JSON file.
Project ID
The project_id value from the Google service account key JSON file.
If you have created multiple projects with the same service account, enter the ID of the project that contains the dataset that you want to connect to.
Connection mode
The mode that you want to use to read data from or write data to Google BigQuery. Choose the Hybrid connection mode for optimum data profiling results.
Use Legacy SQL for Custom Query
Select this option to use a legacy SQL to define a custom query. If you clear this option, use a standard SQL to define a custom query. Clear this option for optimum data profiling results.
Retry Strategy Section
Enable or disable retry. When you read data from Google BigQuery in staging mode, you can configure the retry strategy when the Google BigQuery V2 connection fails to connect to the Google BigQuery source.
6Click Test Connection.
7Click Save.
Create endpoint catalog sources for connection assignment
An endpoint catalog source represents a source system that the catalog source references. Before you perform connection assignment, create endpoint catalog sources and run the catalog source jobs.
You can then perform connection assignment to reference source systems to view complete lineage with source system objects.
Import a relationship inference model
Import a relationship inference model if you want to configure the relationship discovery capability. You can either import a predefined relationship inference model, or import a model file from your local machine.
1In Metadata Command Center, click Explore on the navigation panel.
2Expand the menu and select Relationship Inference Model. The following image shows the Explore page with the Relationship Inference Model menu:
3Select one of the following options:
- Import Predefined Content. Imports a predefined relationship inference model called Column Similarity Model v1.0.
- Import. Imports the predefined relationship inference model from your local machine. Select this if you previously imported predefined content into your local machine and the inference model is stored on the machine.
To import a file, click Choose File in the Import Relationship Inference Model window and navigate to the model file on your local machine. You can also drag and drop the file.
The imported models appear in the list of relationship inference models on the Relationship Discovery tab.