Before you create a catalog source, ensure that you have the information required to connect to the source system.
Perform the following tasks:
•Assign the required permissions.
•Configure a connection to the Microsoft Azure Data Lake Storage Gen2 source system in Administrator.
•Optionally, if you want to identify pairs of similar columns and relationships between tables within a catalog source, import a relationship inference model.
Verify permissions
To extract metadata and to configure other capabilities that a catalog source might include, you need account access and permissions on the source system. The permissions required might vary depending on the capability.
Permissions to extract metadata
Ensure that you have the required permissions to enable metadata extraction and to access the Windows or the Linux file system.
•Create a user account for the Informatica user to access the Microsoft Azure Data Lake Storage Gen2 source system.
•Grant read permission to the user account.
Permissions to run data profiles
Ensure that you have the required permissions to run profiles.
Grant the following permissions:
•Read permission. Required to read the data and metadata.
•Write permission. Required to write the profiling results to the staging location.
•Delete permission. Required to delete the profiling results from the staging location.
•Execute permission. Required to perform the test connection of query executions.
Permissions to perform data classification
You can perform data classification with the permissions required to perform metadata extraction.
Permissions to perform relationship discovery
You can perform relationship discovery with the permissions required to perform metadata extraction.
Permissions to perform glossary association
You can perform glossary association with the permissions required to perform metadata extraction.
Create a connection
Before you configure the Microsoft Azure Data Lake Storage Gen2 catalog source, create a connection object in Administrator.
1In Administrator, select Connections.
2Click New Connection.
3Enter the following connection details:
Property
Description
Connection Name
Name of the connection.
Each connection name must be unique within the organization. Connection names can contain alphanumeric characters, spaces, and the following special characters: _ . + -,
Maximum length is 255 characters.
Description
Description of the connection. Maximum length is 4000 characters.
Type
Microsoft Azure Data Lake Storage Gen2
Runtime Environment
The name of the runtime environment where you want to run tasks.
Account Name
Microsoft Azure Data Lake Storage Gen2 account name or the service name.
File System Name
The name of the file system in the Microsoft Azure Data Lake Storage Gen2 account.
Directory Path
The path of a directory without the file system name.
Default is /.
4Select the authentication type to connect to Microsoft Azure Data Lake Storage Gen2 and enter the required properties. You can use the following authentication types:
- Service Principal Authentication
- Shared Key Authentication
- Managed Identity Authentication
5Click Test Connection.
Service Principal Authentication
This authentication method uses the client ID, client secret, and tenant ID to connect to Microsoft Azure Data Lake Storage Gen2.
The following table describes the connection properties for the Service Principal Authentication type:
Property
Description
Client ID
The client ID of your application.
Specify the client ID for your application registered in the Azure Active Directory.
Client Secret
The client secret key generated for the client ID.
Specify the client secret key to complete the OAuth authentication in the Azure Active Directory.
Tenant ID
The directory ID of the Azure Active Directory.
Endpoint Suffix
The type of Microsoft Azure endpoints.
Select one of the following endpoints:
- core.windows.net. Connects to Azure endpoints.
- core.usgovcloudapi.net. Connects to US government Microsoft Azure Data Lake storage Gen2 endpoints.
- core.chinacloudapi.cn. Connects to Microsoft Azure Data Lake storage Gen2 endpoints in the China region.
Default is core.windows.net.
Shared Key Authentication
This authentication method uses the account key to connect to Microsoft Azure Data Lake Storage Gen2.
The following table describes the connection properties for the Shared Key Authentication type:
Property
Description
Account Key
The account key for the Microsoft Azure Data Lake Storage Gen2 account.
Endpoint Suffix
The type of Microsoft Azure endpoints.
Select one of the following endpoints:
- core.windows.net. Connects to Azure endpoints.
- core.usgovcloudapi.net. Connects to US government Microsoft Azure Data Lake storage Gen2 endpoints.
- core.chinacloudapi.cn. Connects to Microsoft Azure Data Lake storage Gen2 endpoints in the China region.
Default is core.windows.net.
Managed Identity Authentication
This authentication method uses the identities that are assigned to applications in Azure to access Azure resources in Microsoft Azure Data Lake Storage Gen2.
When you create a Microsoft Azure Data Lake Storage Gen2, select the Azure virtual machine on which you have installed the Secure Agent. If you enable system assigned identity, assign the required role or permissions to the Azure virtual machine to run the mappings and tasks. If you enable user assigned identity, assign the required role or permissions to the user assigned identity.
The following table describes the connection properties for the Managed Identity Authentication type:
Property
Description
Client ID
The client ID of your application.
Endpoint Suffix
The type of Microsoft Azure endpoints.
Select one of the following endpoints:
- core.windows.net. Connects to Azure endpoints.
- core.usgovcloudapi.net. Connects to US government Microsoft Azure Data Lake storage Gen2 endpoints.
- core.chinacloudapi.cn. Connects to Microsoft Azure Data Lake storage Gen2 endpoints in the China region.
Default is core.windows.net.
Import a relationship inference model
Import a relationship inference model if you want to configure the relationship discovery capability. You can either import a predefined relationship inference model, or import a model file from your local machine.
1In Metadata Command Center, click Explore on the navigation panel.
2Expand the menu and select Relationship Inference Model. The following image shows the Explore page with the Relationship Inference Model menu:
3Select one of the following options:
- Import Predefined Content. Imports a predefined relationship inference model called Column Similarity Model v1.0.
- Import. Imports the predefined relationship inference model from your local machine. Select this if you previously imported predefined content into your local machine and the inference model is stored on the machine.
To import a file, click Choose File in the Import Relationship Inference Model window and navigate to the model file on your local machine. You can also drag and drop the file.
The imported models appear in the list of relationship inference models on the Relationship Discovery tab.