IBM InfoSphere DataStage is an ETL tool to extract, transform, and load data from multiple sources to target databases or files.
Objects extracted
You can extract metadata from the following DataStage assets:
•Project
•Folder
•Sequence Job
•Parallel Job
•Server Job
•Parallel Shared Container
•Server Shared Container
IBM InfoSphere DataStage jobs consist of stages that you can add to perform specific tasks. Metadata Command Center supports the following stages:
•Parallel job stages:
- Aggregator
- Change Capture
- Checksum
- Complex Flat File
- Copy
- DB2 Connector
- DRS Connector
- Data Set
- FTP
- FTP Plug In
- File Connector
- File Set
- Filter
- Funnel
- Hierarchical Data
Note: Partially compatible. You can use it as an internal component, but not as the job input or output.
- Hive Connector
- JDBC Connector
- Join
- Lookup
- Merge
- Modify
- Netezza Connector
- ODBC Connector
- Oracle Connector
- Peek
- Pivot
- Pivot Enterprise
- Remove Duplicates
- Row Generator
- Sequential File
- Slowly Changing Dimensions
- Sort
- Stored Procedure Connector
- Surrogate Key Generator
- Switch
- Teradata Connector
- Transformer
- Unstructured Data
- Xml Output
- Local Container
- Shared Container
- Snowflake Connector
•Server job stages:
- Aggregator
- CODBCStage
- DB2 Connector
- FTP
- Hashed File
- Netezza Connector
- ODBC Connector
- Oracle Connector
- OracleOCI
- Sequential File
- Sort
- Stored Procedure Connector
- Teradata Connector
- Transformer
- Local Container
- Shared Container
•Sequence job stages:
- End Loop Activity
- Execute Command
- Job Activity
- Start Loop Activity
- User Variables
Parent sequence jobs don’t appear as separate items in the output, but IBM InfoSphere DataStage catalog sources process the target jobs and include them in the output.
The extracted metadata appear in Data Governance and Catalog as the following assets:
•Project
•Folder
•Job Instance
•Calculation
•Sequence Job
•Parallel Job
•Server Job
Note: Parallel Shared Container and Server Shared Container appear as Calculation under Job Instance in Data Governance and Catalog.