Enterprise Data Catalog Scanner Configuration Guide > Configuring Cloud Resources > Microsoft Azure Blob Storage

Microsoft Azure Blob Storage

Configure a Microsoft Azure Blob Storage resource type to extract metadata from the Microsoft Azure Blob Storage.

Objects Extracted

Enterprise Data Catalog extracts only files from Microsoft Azure Blob Storage source.

Permissions to Configure the Resource

Configure read permission on the Microsoft Azure Blob Storage data source for the user account that you use to access the data source.

Supported File Types

The Microsoft Azure Blob Storage resource enables you to extract metadata from structured, unstructured, and extended unstructured files.

The structured files supported are:

•AVRO files
•Delimited files
•Text files
•JSON files
•Parquet files
•XML files

The unstructured files supported are:

•Apple files
•Compressed files
•Email

The extended unstructured files are:

•VB files
•ASP files
•TIF files
•LOG files
•CSS files
•ASPX files
•DLL files
•GIF files
•SQL files

Assign read and write permissions to the files to extract metadata.

Basic Information

The General tab includes the following basic information about the resource:

Information	Description
Name	The name of the resource.
Description	The description of the resource.
Resource type	The type of the resource.
Execute On	You can choose to execute on the default catalog server or offline.

Resource Connection Properties

Provide the Account Key for Microsoft Azure Blob Storage in Informatica Administrator. Use the value listed for key 1 under Access Keys in the Azure portal.

The General tab includes the following properties:

Property	Description
Blob Endpoint URL	Microsoft Azure Blob Storage URL to access a container. Use the value listed for the Primary BLOB Service endpoint URL in the Azure portal.
Account Name	Name of the Microsoft Azure Blob Storage.
Container Name	Name of the container that contains the blobs.
Source Directory	The source directory from where metadata needs to be extracted.
Shared Access Signature Token	A token that provides access to the Microsoft Azure Blob Storage resource. Use the value for the Shared Access Signature in the Azure portal.

The Metadata Load Settings tab includes the following properties:

Property	Description
Enable Source Metadata	Extracts metadata from the data source.
Blob Prefix	Use this option to sort the blobs based on the prefix of the Microsoft Azure Blob name. This value is case sensitive.
File Types	Select any or all of the following file types from which you want to extract metadata: - All. Use this option to specify if you want to extract metadata from all file types. - Select. Use this option to specify that you want to extract metadata from specific file types. Perform the following steps to specify the file types: 1. Click Select. The Select Specific File Types dialog box appears. 2. Select the required files from the following options: - Extended unstructured formats. Use this option to extract metadata from file types such as audio files, video files, image files, and ebooks. - Structured file types. Use this option to extract metadata from file types, such as Avro, Parquet, JSON, XML, text, and delimited files. - Unstructured file types. Use this option to extract metadata from file types such as Microsoft Excel, Microsoft PowerPoint, Microsoft Word, web pages, compressed files, emails, and PDF. 3. Click Select. Note: You can select Specific File Types option in the dialog box to select files under all the categories.
Other File Types	Extracts basic file metadata such as, file size, path, and time stamp, from file types not present in the File Types property.
Treat Files Without Extension As	Select one of the following options to identify files without an extension: - None - Avro - Parquet
Enter File Delimiter	Specify the file delimiter if the file from which you extract metadata uses a delimiter other than the following list of delimiters: - Comma (,) - Horizontal tab (\t) - Semicolon (;) - Colon (:) - Pipe symbol (\|) Verify that you enclose the delimiter in single quotes. For example, '$'. Use a comma to separate multiple delimiters. For example, '$','%','&'
First Level Directory	Specify a directory or a list of directories under the source directory. If you leave this option blank, Enterprise Data Catalog imports all the files from the specified source directory. To specify a directory or a list of directories, you can perform the following steps: 1. Click Select.... The Select First Level Directory dialog box appears. 2. Use one of the following options to select the required directories: - Select from list: select the required directories from a list of directories. - Select using regex: provide an SQL regular expression to select schemas that match the expression. Note: If you want to select multiple directories, you must separate the directories with a semicolon (;).
Include Subdirectory	Select this option to import all the files in the subdirectories under the source directory.
Case Sensitive	Specifies that the resource is configured for case sensitivity. Select one of the following values: - True. Select this check box to specify that the resource is configured as case sensitive. - False. Clear this check box to specify that the resource is configured as case insensitive. The default value is True.
Memory	The memory required to run the scanner job. Select one of the following values based on the data set size imported: - Low - Medium - High Note: For more information about the memory values, see the Tuning Enterprise Data Catalog Performance article on How To-Library Articles tab in the Informatica Doc Portal
Custom Options	JVM parameters that you can set to configure scanner container. Use the following arguments to configure the parameters: - -Dscannerloglevel=<DEBUG/INFO/ERROR>. Changes the log level of scanner to values, such as DEBUG, ERROR, or INFO. Default value is INFO. - -Dscanner.container.core=<No. of core>. Increases the core for the scanner container. The value must be a number. - -Dscanner.yarn.app.environment=<key=value>. Key pair value that you need to set in the Yarn environment. Use a comma to separate the key pair value. - -Dscanner.pmem.enabled.container.memory.jvm.memory.ratio=<1.0/2.0>. Increases the scanner container memory when pmem is enabled. Default value is 1.
Track Data Source Changes	View metadata source change notifications in Enterprise Data Catalog.

You can enable data discovery for a Microsoft Azure Blob Storage resource. For more information, see the Enable Data Discovery topic. You can enable composite data domain discovery for a Microsoft Azure Blob Storage resource. For more information, see the Composite Data Domain Discovery topic.