Microsoft Azure Data Lake Storage

Information	Description
Name	The name of the resource.
Description	The description of the resource.
Resource type	The type of the resource.
Execute On	You can choose to execute on the default catalog server or offline.

Property	Description
Account Name	Enter the storage account name that you created in the Azure portal.
ADLS Source Type	Choose Data Lake Store Gen 1 or Data Lake Store Gen 2 option.
Client Id	Enter the client ID to connect to the Microsoft Azure Data Lake Store. Use the value listed for the application ID in the Azure portal. This option appears when you choose the Data Lake Store Gen 1 option as the ADLS Source Type.
Client Key	Enter the client key to connect to the Microsoft Azure Data Lake Store. Use the Azure Active Directory application key value in the Azure portal as the client key. This option appears when you choose the Data Lake Store Gen 1 option as the ADLS Source Type.
Directory Name	Directory name of the Azure Data Lake Store.
Auth EndPoint URL	The OAuth 2.0 token endpoint URL in the Azure portal. This option appears when you choose the Data Lake Store Gen 1 option as the ADLS Source Type.
Storage Account Key	Enter key1 or key2 as the storage account key. Navigate to the Settings > Access keys section in Azure portal to view the storage account keys. This option appears when you choose the Data Lake Store Gen 2 option as the ADLS Source Type.
Connect through a proxy server	Proxy server to connect to the data source. Default is Disabled. This option appears when you choose the Data Lake Store Gen 2 option as the ADLS Source Type.
Proxy Host	Host name or IP address of the proxy server. This option appears when you choose the Data Lake Store Gen 2 option as the ADLS Source Type.
Proxy Port	Port number of the proxy server. This option appears when you choose the Data Lake Store Gen 2 option as the ADLS Source Type.
Proxy User Name	Required for authenticated proxy. Authenticated user name to connect to the proxy server. This option appears when you choose the Data Lake Store Gen 2 option as the ADLS Source Type.
Proxy Password	Required for authenticated proxy. Password for the authenticated user name. This option appears when you choose the Data Lake Store Gen 2 option as the ADLS Source Type.

Property	Description
Enable Source Metadata	Extracts metadata from the data source.
File Types	Select any or all of the following file types from which you want to extract metadata: - All. Use this option to specify if you want to extract metadata from all file types. - Select. Use this option to specify that you want to extract metadata from specific file types. Perform the following steps to specify the file types: 1. Click Select. The Select Specific File Types dialog box appears. 2. Select the required files from the following options: - Extended unstructured formats. Use this option to extract metadata from file types such as audio files, video files, image files, and ebooks. - Structured file types. Use this option to extract metadata from file types, such as Avro, Parquet, JSON, XML, text, and delimited files. - Unstructured file types. Use this option to extract metadata from file types such as Microsoft Excel, Microsoft PowerPoint, Microsoft Word, web pages, compressed files, emails, and PDF. 3. Click Select. Note: You can select Specific File Types option in the dialog box to select files under all the categories.
Enable Exclusion Filter	Filter to exclude folders from the data source during the metadata extraction phase. This option appears when you choose Azure Data Lake Storage Gen2 V2 as the resource type.
Filter Condition	Filter condition to exclude folders from the data source. Select the filter condition from the following list: - Starting With. Excludes all folders that start with the keyword. - Ending With. Excludes all folders that end with the keyword. - Contains. Excludes all folders that contain the keyword. - Named. Excludes all folders that are named as the keyword. This option appears when you choose Azure Data Lake Storage Gen2 V2 as the resource type.
Filter Value	Filter value or pattern for the filter condition. Specify the value or pattern within double quotes. Use a comma to separate multiple values. This option appears when you choose Azure Data Lake Storage Gen2 V2 as the resource type.
Is Filter Case Sensitive	Specify if the filter value is case sensitive. Default is True. This option appears when you choose Azure Data Lake Storage Gen2 V2 as the resource type.
Other File Types	Extract basic file metadata such as, file size, path, and time stamp, from file types not present in the File Types property.
Treat Files Without Extension As	Select one of the following options to identify files without an extension: - None - Avro - Parquet
Enter File Delimiter	Specify the file delimiter if the file from which you extract metadata uses a delimiter other than the following list of delimiters: - Comma (,) - Horizontal tab (\t) - Semicolon (;) - Colon (:) - Pipe symbol (\|) Verify that you enclose the delimiter in single quotes. For example, '$'. Use a comma to separate multiple delimiters. For example, '$','%','&'
First Level Directory	Specify a directory or a list of directories under the source directory. If you leave this option blank, Enterprise Data Catalog imports all the files from the specified source directory. To specify a directory or a list of directories, you can perform the following steps: 1. Click Select.... The Select First Level Directory dialog box appears. 2. Use one of the following options to select the required directories: - Select from list: select the required directories from a list of directories. - Select using regex: provide an SQL regular expression to select schemas that match the expression. Note: If you want to select multiple directories, you must separate the directories with a semicolon (;).
Recursive Scan	Recursively scans the subdirectories under the selected first-level directories. Recursive scan is required for partitioned file discovery.
Enable Partitioned File Discovery	Identifies and publishes horizontally partitioned files under the same directory and files organized in hierarchical Hive-style directory structures as a single partitioned file.
Non Strict Mode	Detects partitions in parquet files when compatible schemas are identified in the files.
Case Sensitive	Specifies that the resource is configured for case sensitivity. Select one of the following values: - True. Select this check box to specify that the resource is configured as case sensitive. - False. Clear this check box to specify that the resource is configured as case insensitive. The default value is True.
Memory	The memory required to run the scanner job. Select one of the following values based on the data set size imported: - Low - Medium - High Note: For more information about the memory values, see the Tuning Enterprise Data Catalog Performance article.
Custom Options	JVM parameters that you can set to configure scanner container. Use the following arguments to configure the parameters: - -Dscannerloglevel=<DEBUG/INFO/ERROR>. Changes the log level of scanner to values, such as DEBUG, ERROR, or INFO. Default value is INFO. - -Dscanner.container.core=<No. of core>. Increases the core for the scanner container. The value should be a number. - -Dscanner.yarn.app.environment=<key=value>. Key pair value that you need to set in the Yarn environment. Use a comma to separate the key pair value. - -Dscanner.pmem.enabled.container.memory.jvm.memory.ratio=<1.0/2.0>. Increases the scanner container memory when pmem is enabled. Default value is 1. - -DmaxPartFilesToValidatePerTable=<number>. Validates the specified number of part files in the partitioned table. Default value is 10. - -DmaxPartFilesToValidatePerPartition=<number>. Validates the specified number of part files for each partition in the partition table. Default value is 5. - -DexcludePatterns=<comma separated regex patterns>. Excludes the files while parsing partition tables based on the regex pattern. By default, file names that start with a period and an underscore are excluded.
Track Data Source Changes	View metadata source change notifications in Enterprise Data Catalog.
Custom Partition Configuration File	Detects custom partitions in the data source. Select the configuration file in JSON format. This option appears when you choose Azure Data Lake Storage Gen2 V2 as the resource type.
Pruned Partition Configuration File	Specify the configuration file in JSON format for partition pruning. This option appears when you choose Azure Data Lake Storage Gen2 V2 as the resource type.
Disable Partition Pruning	Option to disable partition pruning. This option appears when you choose Azure Data Lake Storage Gen2 V2 as the resource type.

Microsoft Azure Data Lake Storage

Objects Extracted

Permissions to Configure the Resource

Supported File Types

Prerequisites

Basic Information

Resource Connection Properties

Profile Avro files