Catalog Source Configuration > Microsoft Azure Data Lake Storage Gen2 > Before you begin
  

Before you begin

Before you create a catalog source, ensure that you have the information required to connect to the source system.
Perform the following tasks:

Verify permissions

To extract metadata and to configure other capabilities that a catalog source might include, you need account access and permissions on the source system. The permissions required might vary depending on the capability.

Permissions to extract metadata

Ensure that you have the required permissions to enable metadata extraction and to access the Windows or the Linux file system.

Permissions to run data profiles

Ensure that you have the required permissions to run profiles.
Grant the following permissions:

Permissions to perform data classification

You can perform data classification with the permissions required to perform metadata extraction.

Permissions to perform relationship discovery

You can perform relationship discovery with the permissions required to perform metadata extraction.

Permissions to perform glossary association

You can perform glossary association with the permissions required to perform metadata extraction.

Create a connection

Before you configure the Microsoft Azure Data Lake Storage Gen2 catalog source, create a connection object in Administrator.
    1In Administrator, select Connections.
    2Click New Connection.
    3Enter the following connection details:
    Property
    Description
    Connection Name
    Name of the connection.
    Each connection name must be unique within the organization. Connection names can contain alphanumeric characters, spaces, and the following special characters: _ . + -,
    Maximum length is 255 characters.
    Description
    Description of the connection. Maximum length is 4000 characters.
    Type
    Microsoft Azure Data Lake Storage Gen2
    Runtime Environment
    The name of the runtime environment where you want to run tasks.
    Account Name
    Microsoft Azure Data Lake Storage Gen2 account name or the service name.
    File System Name
    The name of the file system in the Microsoft Azure Data Lake Storage Gen2 account.
    Directory Path
    The path of a directory without the file system name.
    Default is /.
    4Select the authentication type to connect to Microsoft Azure Data Lake Storage Gen2 and enter the required properties. You can use the following authentication types:
    5Click Test Connection.

Service Principal Authentication

This authentication method uses the client ID, client secret, and tenant ID to connect to Microsoft Azure Data Lake Storage Gen2.
The following table describes the connection properties for the Service Principal Authentication type:
Property
Description
Client ID
The client ID of your application.
Specify the client ID for your application registered in the Azure Active Directory.
Client Secret
The client secret key generated for the client ID.
Specify the client secret key to complete the OAuth authentication in the Azure Active Directory.
Tenant ID
The directory ID of the Azure Active Directory.
Endpoint Suffix
The type of Microsoft Azure endpoints.
Select one of the following endpoints:
  • - core.windows.net. Connects to Azure endpoints.
  • - core.usgovcloudapi.net. Connects to US government Microsoft Azure Data Lake storage Gen2 endpoints.
  • - core.chinacloudapi.cn. Connects to Microsoft Azure Data Lake storage Gen2 endpoints in the China region.
Default is core.windows.net.

Shared Key Authentication

This authentication method uses the account key to connect to Microsoft Azure Data Lake Storage Gen2.
The following table describes the connection properties for the Shared Key Authentication type:
Property
Description
Account Key
The account key for the Microsoft Azure Data Lake Storage Gen2 account.
Endpoint Suffix
The type of Microsoft Azure endpoints.
Select one of the following endpoints:
  • - core.windows.net. Connects to Azure endpoints.
  • - core.usgovcloudapi.net. Connects to US government Microsoft Azure Data Lake storage Gen2 endpoints.
  • - core.chinacloudapi.cn. Connects to Microsoft Azure Data Lake storage Gen2 endpoints in the China region.
Default is core.windows.net.

Managed Identity Authentication

This authentication method uses the identities that are assigned to applications in Azure to access Azure resources in Microsoft Azure Data Lake Storage Gen2.
When you create a Microsoft Azure Data Lake Storage Gen2, select the Azure virtual machine on which you have installed the Secure Agent. If you enable system assigned identity, assign the required role or permissions to the Azure virtual machine to run the mappings and tasks. If you enable user assigned identity, assign the required role or permissions to the user assigned identity.
The following table describes the connection properties for the Managed Identity Authentication type:
Property
Description
Client ID
The client ID of your application.
Endpoint Suffix
The type of Microsoft Azure endpoints.
Select one of the following endpoints:
  • - core.windows.net. Connects to Azure endpoints.
  • - core.usgovcloudapi.net. Connects to US government Microsoft Azure Data Lake storage Gen2 endpoints.
  • - core.chinacloudapi.cn. Connects to Microsoft Azure Data Lake storage Gen2 endpoints in the China region.
Default is core.windows.net.

Import a relationship inference model

Import a relationship inference model if you want to configure the relationship discovery capability. You can either import a predefined relationship inference model, or import a model file from your local machine.
    1In Metadata Command Center, click Explore on the navigation panel.
    2Expand the menu and select Relationship Inference Model. The following image shows the Explore page with the Relationship Inference Model menu:The image shows the Explore page with the Relationship Inference Model menu and the Import Predefined Content options.
    3Select one of the following options:
    The imported models appear in the list of relationship inference models on the Relationship Discovery tab.