You can run data observability jobs on catalog sources to extract and profile metadata.
Here are a few essential points to know about data observability:
•You can use data observability to identify anomalies on the data that you configure and filter for the catalog source. If you enable metadata extraction filters and you further apply metrics for profiling filters, you can identify anomalies on a subset of the entire data.
•Before you enable data observability for a catalog source, you must enable data profiling for the catalog source.
•Depending on the Metric Filters you select, jobs for metadata extraction and profiling get initiated. You can also run these data observability jobs so that they can extract metadata for freshness and volume and for detecting data profiling anomalies.
Note:
Though metadata gets extracted from assets when you enable data observability, the metadata is sufficient to measure only freshness and volume.
•Each time you run a data observability job, Metadata Command Center profiles the data on which metadata is extracted and then detects different anomalies on the profiled data. You may need to run multiple data observability jobs to detect the following data profiling anomalies:
- One job run is required to detect the Drop from Maximum and the Surge from Minimum anomalies.
- Two job runs are required to detect the 100% or 0% Change Detection and the Schema-based anomalies.
- Three job runs are required to detect the Standard Deviation, the Static Data, and the Breaking Trends anomalies.
•If you modify the profiling filters after running the data observability job multiple times, the resulting profiled data changes. Historic profiled data and historic anomalies are lost and the subsequent data observability jobs you run are run on the new data. To accurately detect anomalies, keep the data observability filters constant over several runs.
•You can use data observability to observe data that contains up to 50,000 profiled data elements.
•When you run a data observability job, the metadata extracted provides a measurement of freshness in comparison to when the observed data was last modified. You can also use this job to observe metadata volume in tables and data sets. Configure the appropriate option in the Metric Configuration field by selecting the Calculated or the Statistics method for measuring volume. Statistics is the default method for measuring volume.
You can measure data observability volume using the Calculated or the Statistics methods for the following catalog sources:
- Google BigQuery
- MariaDB
- Microsoft Fabric Data Lakehouse
- Microsoft Fabric Data Warehouse
- Microsoft Azure SQL Server
- Microsoft Azure Synapse Analytics
- Microsoft SQL Server
- Oracle
- SAP S/4HANA Cloud
- Snowflake
You can measure data observability volume using only the Calculated method for the following catalog sources: