You can use Metadata Command Center to extract metadata from a source system.
A source system is any system that contains data or metadata. For example, AWS Glue is a source system from which you can extract metadata through an AWS Glue catalog source with Metadata Command Center. A catalog source is an object that represents and contains metadata from the source system.
Before you extract metadata from a source system, you first create and register a catalog source that represents the source system.
When Metadata Command Center extracts metadata, Data Governance and Catalog displays the extracted metadata and its attributes as technical assets. You can then perform tasks such as analyzing the assets, viewing lineage, and creating links between those assets and their business context.
The following table describes the capabilities of the catalog source:
Capability
Description
Advanced Programming Language Parsing
Advanced Programming Language Parsing parses the source system code in addition to extracting objects from the source system.
Extraction and view process
To extract metadata from a source system, configure the catalog source and run the metadata extraction job in Metadata Command Center. Then view the results in Data Governance and Catalog.
The following image shows the process to extract metadata from a source system:
After you verify prerequisites, perform the following tasks to extract metadata from AWS Glue:
1Register a catalog source. Create a catalog source object, select the source system, and select the connection object to connect to the Amazon Athena source system.
2Configure the catalog source. Specify the runtime environment, optionally configure parameters for the metadata extraction capability, and add filters to include or exclude source system assets from metadata extraction.
3Associate stakeholders. Optionally, associate users with technical assets, giving the users permission to perform actions determined by their roles.
4Run or schedule the catalog source job.
5Optionally, assign a connection to referenced source system assets.
After you run the catalog source job, you can view the results in Data Governance and Catalog.
About the AWS Glue catalog source
You can use the AWS Glue catalog source to extract metadata from an AWS Glue source system.
AWS Glue is a serverless ETL (extract, transform, and load) service that helps discover, prepare, and integrate data from multiple sources for analysis, machine learning, and application development.
You use Amazon Athena to query databases and tables created in AWS Glue. You can also use Amazon Athena to create schemas to use in AWS Glue.
Extracted metadata
You can use the AWS Glue catalog source to extract metadata from an AWS Glue source.
Metadata Command Center extracts the following metadata from the AWS Glue source system:
•Calculation
•Job
•Job instance
Compatible functionalities
AWS Glue offers integration with a diverse range of modules and the Python programming language.
You can use AWS Glue with the following Python functionalities:
•Standard language constructions
•Standard built-in functions
•Partially-compatible modules:
Note: Data Governance and Catalog processes only a subset of library functions of partially-compatible modules.
- abs
- adal
- argparse
- array
- ast
- awsglue
- azure
- base64
- binascii
- calendar
- codecs
- collections
- concurrent
- contextlib
- contextvars
- copy
- copyreg
- csv
- dataclasses
- datetime
- decimal
- delta
- difflib
- distutils
- email
- enum
- errno
- fnmatch
- fractions
- functools
- gc
- genericpath
- gettext
- glob
- graphframes
- hashlib
- heapq
- hmac
- importlib
- inspect
- io
- itertools
- json
- keyword
- locale
- logging
- math
- matplotlib
- nt
- numbers
- numpy
- operator
- os
- pandas
- pathlib
- pickle
- pkgutil
- posix
- posixpath
- pprint
- py4j
- pyodbc
- pyspark
- pytz
- random
- re
- reprlib
- requests
- seaborn
- secrets
- shutil
- simplejson
- six
- sklearn
- smtplib
- socket
- ssl
- stat
- string
- struct
- subprocess
- sys
- teradatasql
- textwrap
- threading
- time
- traceback
- types
- typing
- urllib
- urllib3
- uuid
- warnings
- weakref
- xml
- yaml
- zipfile
- zlib
•Custom libraries
Note: Custom libraries are libraries created by a user. You can also use a WHL file for your custom library.
If the catalog source detects an incompatible function or library, it can't process the statement. It skips the statement and continues to process the next one.