Microsoft Azure Synapse, formerly known as Azure SQL Data Warehouse, is a big data analytic service that queries and analyzes your data. Create a Microsoft Azure Synapse catalog source to access data from Microsoft Azure SQL Data Warehouse.
Objects extracted
The metadata extraction capability extracts the following objects from a Microsoft Azure Synapse catalog source:
•Database
•Table
•Schema
•View
•Materialized View
•Function
•Stored Procedure
•Column
•External Table
•External Column
Note: Effective in the November 2023 release, Metadata Command Center retrieves the database name from the source system instead of the Azure DW JDBC URL. If you extracted database objects before the November 2023 release, and the name of a database that you entered in the Azure DW JDBC URL uses a different case than the original database name, you might extract duplicate assets when you run the catalog source job again. To ensure that you extract the correct database structure, select Delete in the Metadata Change Option and rerun the catalog source job.
Prerequisites
Use the Microsoft Azure Synapse SQL connector to connect to Microsoft Azure Synapse source system. For information about configuring a connection in Administrator, see Connections in the Cloud Common Services help.
Configure permissions
Before you create a Microsoft Azure Synapse catalog source, configure permissions for the Microsoft Azure SQL Server database user account that you use to connect to Microsoft Azure SQL Server. Metadata Command Center uses SQL Server authentication to connect to the Microsoft Azure SQL Server database. Verify that the user account you use to connect to Microsoft Azure SQL Server has an SQL Server login account.
Permissions to extract metadata
Configure the VIEW DEFINITION and CONNECT permissions for the user account. These permissions allow you to view all schemas from where you can load metadata. For information about configuring permissions for the user account, see the Microsoft Azure Synapse documentation.
Permissions to run data profiles
To perform data profiling on Microsoft Azure Synapse, grant either the db_owner privilege or the following more granular privileges to the user to connect to Microsoft Azure SQL Data Warehouse and perform operations successfully:
- EXEC sp_addrolemember 'db_datareader', '<user>'; // Alternately assign permission to the individual table.
- EXEC sp_addrolemember 'db_datawriter', '<user>'; // Alternately assign permission to the individual table.
- GRANT ALTER ANY EXTERNAL DATA SOURCE TO <user>;
- GRANT ALTER ANY EXTERNAL FILE FORMAT TO <user>;
- GRANT CONTROL TO <user>; // To grant all permissions on the database.
or
GRANT ALTER ANY SCHEMA TO <user>; // To grant permissions only on the schema.
- GRANT CREATE TABLE TO <user>;
Data profiling for Microsoft Azure Synapse objects
Configure data profiling to run profiles on the metadata extracted from a Microsoft Azure Synapse source system.
You can run data profiles on the following Microsoft Azure Synapse objects:
•Tables
•External Tables created in the following file formats on the Microsoft Azure Serverless SQL pool:
- Delimited text
- Parquet
•Views
Configure the following data profiling properties in the Data Profiling tab of the Configuration page in Metadata Command Center:
Property
Description
Azure Blob Container
Name of the Blob container that is used to stage profiled data.
Applies to the Azure Blob storage type.
ADLS FileSystem Name
Name of the file system that is used to stage profiled data.
Applies to the ADLS Gen2 storage type.
Staging File Format
File format to use when you stage the profiled data.
Select one of the following formats:
- Delimited Text. Select if the external table is created in the Delimited text file format.
- Parquet. Select if the external table is created in the Parquet file format.
Applies to external tables created on the Microsoft Azure Serverless SQL pool.
Field Delimiter
Character used to separate fields in a file. Default is 0x1e. Non-printable characters must be specified as hexadecimal characters.
Note: Use the field delimiter that you used while creating the external table in the Delimited text file format.
Quote Character
Specifies the quote character to skip when you read data from Microsoft Azure Synapse SQL.
The quote character that you specify must not exist in the source table. If it exists, enter a different quote character value. Default is 0x1f.
Note: Use the quote character that you used while creating the external table in the Delimited text file format.
You can view the profiling statistics in Data Governance and Catalog. The data profiling task runs profiles on the following data types for Microsoft Azure Synapse objects:
•Bigint
•Bit
•Char
•Date
•Datetime
•Datetime2
•Decimal
•Float
•Int
•Nchar
•Nvarchar
•Real
•Smalldatetime
•Smallint
•Time
•Tinyint
•Uniqueidentifier
•Varchar
Sampling type
Determine the sample rows on which you want to run the data profiling task. You can choose one of the following sampling types for a Microsoft Azure Synapse catalog source:
- All Rows
- Limit N Rows
- Random N Rows
Data classification for Microsoft Azure Synapse objects
Configure data classification for Microsoft Azure Synapse catalog sources to classify and organize data in your organization. You can view the data classification results in Data Governance and Catalog.
Connection properties
When you configure a connection to Microsoft Azure Synapse in Administrator, you can view the connection properties for that connection on the Registration page in Metadata Command Center.
The following table describes the Microsoft Azure Synapse connection properties:
Property
Description
Connection Name
Name of the connection.
Each connection name must be unique within the organization. Connection names can contain alphanumeric characters, spaces, and the following special characters: _ . + -,
Maximum length is 255 characters.
Description
Description of the connection. Maximum length is 4000 characters.
Type
Microsoft Azure Synapse SQL
Runtime Environment
The name of the runtime environment where you want to run tasks.
Azure DW JDBC URL
The Microsoft Azure Synapse SQL JDBC connection string.
Use the following string to connect to Microsoft Azure Synapse SQL:
You can include an authentication parameter in the connection string to specify the authentication type. You can configure the following authentication types to connect to Microsoft Azure Synapse SQL:
- Microsoft SQL Server
- Azure Active Directory
- Managed Identity
- Service Principal
If you don't include an authentication parameter in the connection string, the Secure Agent uses Microsoft SQL Server authentication as the authentication type.
Connection string format for Microsoft SQL Server authentication
User name to connect to the Microsoft Azure Synapse SQL account.
- For AAD authentication, provide your AAD user name.
- For Microsoft SQL server authentication, provide your SQL auth user name.
- For service principal authentication, provide the application ID or client ID for your application registered in Azure Active Directory.
This property doesn't apply to Managed Identity authentication.
Azure DW JDBC Password
Password to connect to the Microsoft Azure Synapse SQL account.
- For AAD authentication, provide the password of the AAD user.
- For Microsoft SQL server authentication, provide the password of SQL auth user.
- For service principal authentication, provide the client secret for your application registered in the Azure Active Directory.
This property doesn't apply to Managed Identity authentication.
Azure DW Client ID
Required if you want to use the user-assigned managed identity for Managed Identity Authentication to connect to Microsoft Azure Synapse SQL.
The client ID of the user-assigned managed identity.
If you use system-assigned managed identity, leave the field empty.
Azure DW Schema Name
Name of the schema in Microsoft Azure Synapse SQL.
You can select Microsoft Azure Blob Storage or Microsoft Azure Data Lake Storage Gen2 as the Azure storage type to stage the data files. Default is Azure Blob. Select your preferred storage type and then configure the storage-specific parameters.
When you select Microsoft Azure Blob as the storage type, you can configure Shared Key Authentication as the authentication type to stage the files.
The following table describes the authentication type that you can configure for Microsoft Azure Blob storage:
Property
Description
Authentication Type
Authentication type to connect to Microsoft Azure Blob storage to stage the files.
You can configure Shared Key Authentication as the authentication type to stage the files.
Shared Key Authentication uses the storage account name and account key to connect to Microsoft Azure Blob storage.
The following table describes the basic connection properties for shared key authentication:
Property
Description
Azure Blob Account Name
Name of the Microsoft Azure Blob Storage account to stage the files.
Azure Blob Account Key
The Microsoft Azure Blob Storage access key to stage the files.
Container Name
The name of the container in the Azure Blob Storage account.
When you select Microsoft Azure Blob as the storage type, you can configure Shared Key Authentication as the authentication type to stage the files.
The following table describes the authentication type that you can configure for Microsoft Azure Blob storage:
Property
Description
Authentication Type
Authentication type to connect to Azure storage to stage the files.
Shared Key Authentication uses the storage account name and account key to connect to Microsoft Azure Data Lake Storage Gen2.
The following table describes the basic connection properties for shared key authentication:
Property
Description
ADLS Gen2 Storage Account Name
Name of the Microsoft Azure Data Lake Storage Gen2 account to stage the files.
ADLS Gen2 Account Key
The Microsoft Azure Data Lake Storage Gen2 access key to stage the files.
File System Name
The name of the file system in the Microsoft Azure Data Lake Storage Gen2 account.
Service Principal Authentication uses the account name, client ID, client secret, and tenant ID to connect to Microsoft Azure Data Lake Storage Gen2.
The following table describes the basic connection properties for service principal authentication:
Property
Description
ADLS Gen2 Storage Account Name
Name of the Microsoft Azure Data Lake Storage Gen2 account to stage the files.
Client ID
The client ID of your application.
Enter the application ID or client ID for your application registered in the Azure Active Directory.
Client Secret
The client secret for your application.
Tenant ID
The directory ID or tenant ID for your application.
File System Name
The name of the file system in the Microsoft Azure Data Lake Storage Gen2 account.
Select Managed Identity Authentication to authenticate using system-assigned or user-assigned identities that are assigned to applications in Azure to access Azure resources in Microsoft Azure Data Lake Storage Gen2.
The following table describes the basic connection properties for managed identity authentication:
Property
Description
ADLS Gen2 Storage Account Name
Name of the Microsoft Azure Data Lake Storage Gen2 account to stage the files.
Client ID
The client ID of your application.
Enter the client ID for the user-assigned managed identity. If the managed identity is system-assigned, leave the field empty.
File System Name
The name of the file system in the Microsoft Azure Data Lake Storage Gen2 account.
Configuration parameters for metadata extraction
Optionally, you can override default context values and job parameters on the Configuration tab.
Note: Click Show Advanced to view all configuration parameters.
The following table describes the configuration parameters that you enter for Catalog Source Configuration Options:
Parameter
Description
Default variables values
Specify a default value for variables used in the programmable objects.
MetaTables Include Filter
Advanced parameter. When you process PL/SQL statements, Metadata Command Center does not read tables or view content by default. If you want to use the content, for example, to process dynamic SQL statements, use the MetaTables Include Filter parameter. This parameter prompts the database for the required metadata. Verify that the user has SELECT permissions for metatables.
Note: Don't use this option to specify filters for tables that you want to include or exclude during the metadata extraction run.
The following table describes the property that you can enter for additional settings:
Note: The Additional Settings section appears when you click Show Advanced.
Property
Description
Expert Parameters
Enter additional configuration options to be passed at runtime. Required if you need to troubleshoot the catalog source job.
Caution: Use expert parameters when it is recommended by Informatica Global Customer Support.