You can extract workbooks, worksheets, and columns from Microsoft Excel files.
Supported file types
You can extract metadata from the following file types:
•AVRO
•CSV
•JSON
•Parquet
•TSV
•TXT
•XML (XML and XSD files)
•Delimited files
•Flat Partition Detection files
•Multilevel Partition Detection files
You can extract metadata from XML and XSD file formats as XML file objects. Metadata Command Center extracts only elements and attributes from XML and XSD files. If the size of the XML file exceeds 100 KB, Metadata Command Center extracts metadata from the initial 100 KB of the file. However, for XSD file types, Metadata Command Center extracts complete metadata.
You can extract metadata from the following Microsoft Excel file types:
•Excel 97-2003 Workbook with XLS extension
•Excel Workbook with XLSX extension
•Excel Macro-Enabled Workbook with XLSM extension
Data profiling for Google Cloud Storage objects
Configure data profiling to run profiles on the metadata extracted from a Google Cloud Storage source system. You can run data profiles on the following Google Cloud Storage objects:
•AVRO
•CSV
•JSON
•Parquet
You can view the profiling statistics in Data Governance and Catalog. The data profiling task runs profiles on the following data types for AVRO, CSV, JSON, and Parquet file formats:
File format
Supported data type
AVRO
- INT
- STRING
CSV
- STRING
JSON
- ARRAY
- BIGINT
- BOOLEAN
- DOUBLE
- INTEGER
- OBJECT
- STRING
Parquet
- BOOLEAN
- INT32
- INT64
- FLOAT
- DOUBLE
- DATE
- DECIMAL
- STRING
Sampling type
You can run the data profiling task on all rows for a Google Cloud Storage catalog source.
Note: To run a profile on an Avro or Parquet file, connect to an advanced cluster. For more information about advanced clusters, see Advanced Clusters help.
Data classification for Google Cloud Storage objects
Configure data classification for a Google Cloud Storage catalog source to classify and organize data in your organization. You can view the data classification results in Data Governance and Catalog.