You can use the File System catalog source to extract metadata from a Windows or Linux file system.
Use the File System catalog source to extract metadata from files located on a Windows or a Linux machine into the catalog. You can use either the Local File System protocol to import metadata from the files located on a local machine or use the Remote File System protocol to import metadata from the files located on a remote Windows machine.
Note:
To extract metadata from files located on a remote Linux machine, configure the SFTP File System catalog source in
Metadata Command Center
.
For more information, see .
Extracted metadata
You can use the File System catalog source to extract metadata from a Windows or Linux file system.
Metadata Command Center extracts files and folders from a Windows or a Linux file system.
You can extract metadata from the following file types:
•AVRO
•Compressed files
- TAR
- ZIP
•Delimited files
- CSV
- TSV
•JSON
•Microsoft Excel files
- Excel 97-2003 Workbook with XLS extension
- Excel Workbook with XLSX extension
- Excel Macro-Enabled Workbook with XLSM extension
•Parquet
•TXT
•XML
•XSD
The following table lists the structures associated with the file types that you can extract metadata from:
File Type
Partition structure
Avro
Single partition, multiple partitions, schema merge
CSV
Single partition, multiple partitions, schema merge
JSON
Single partition
Parquet
Single partition, multiple partitions, schema merge
XML
Single partition, multiple partitions, schema merge
Note: File System
catalog sources can only extract input files encoded in UTF-8.
You can extract workbooks, worksheets, and columns from Microsoft Excel files.
You can extract metadata from XML and XSD file formats as XML file objects. Metadata Command Center extracts only elements and attributes from XML and XSD files. If the size of the XML file exceeds 100 KB, Metadata Command Center extracts metadata from the initial 100 KB of the file. However, for XSD file types, Metadata Command Center extracts complete metadata.
Data profiling for File System objects
Configure data profiling to run profiles on the metadata extracted from a Windows or Linux file system using the Local File System protocol or the Remote File System protocol.
Use the Local File System protocol or the Remote File System protocol to run the profiles.
You can run data profiles on the following File System objects:
•Delimited file
•Folder
You can view the profiling statistics in Data Governance and Catalog. The data profiling task runs profiles on the String data type for File System objects.