Asset Details > Technical asset types > Amazon S3
  

Amazon S3

Amazon Simple Storage Service (Amazon S3) is an object storage service offered by Amazon Web Services (AWS).
Metadata Command Center extracts files, folders, and buckets from an Amazon S3 source system.
You can extract metadata from S3 data sources. By default, you extract metadata from the Amazon S3 Storage, but you can also extract metadata from the S3 Scality Storage or MinIO.
You can extract workbooks, worksheets, and columns from Microsoft Excel files.
Note: In the following scenarios, column names of delimited files are extracted as generic columns, such as Column1 and Column2:

Data profiling for Amazon S3 objects

Configure data profiling to run profiles on the metadata extracted from an Amazon S3 source system. You can run data profiles on the following Amazon S3 objects:
You can view the profiling statistics in Data Governance and Catalog. The data profiling task runs profiles on the following data types for AVRO, CSV, and Parquet file formats:
File format
Data type
Avro
  • - INT
  • - STRING
CSV
  • - STRING
Parquet
  • - BOOLEAN
  • - INT32
  • - INT64
  • - FLOAT
  • - DOUBLE
  • - DATE
  • - DECIMAL
  • - STRING
Sampling type
You can run the data profiling task on all rows for an Amazon S3 source system.
Note: To run a profile on an Avro or Parquet file, connect to an advanced cluster. For more information about advanced clusters, see Advanced Clusters help.

Data classification for Amazon S3 objects

Configure data classification for Amazon S3 catalog sources to classify and organize data in your organization. You can view the data classification results in Data Governance and Catalog.

Data Lineage

Lineage data is available for Amazon S3 assets that connect to the following source systems:
For more information about data lineage, see Data Lineage in the Working With Assets help.