IBM InfoSphere DataStage
IBM® InfoSphere® DataStage® is an ETL platform that is used to integrate data across multiple systems.
Objects Extracted
The IBM InfoSphere DataStage resource supports the following source and target data sources included in ETL mappings in IBM InfoSphere DataStage:
- •Oracle
- •Teradata
- •Flat files
The IBM InfoSphere DataStage resource extracts mappings for server and parallel jobs from the DSX or XML file. If a DSX or XML file includes sequence and standalone jobs, the resource extracts only the server and parallel jobs linked to the sequence jobs.
The resource extracts metadata for the following types of transformations in the mappings:
- •Aggregator*
- •Filter*
- •Join*
- •Funnel*
- •Sort*
- •Lookup*
- •Merge*
- •Remove Duplicates*
- •Difference*
- •Copy
- •Peek
- •SQL Transformer
- •Basic Transformer
- •Change Capture*
- •Pivot
- •Switch
- •XML Output
Note: * Indicates the transformations for which the resource displays properties in the transformation logic in the Lineage and Impact tab.
Resource Connection Properties
The following table describes the resource connection properties:
Property | Description |
|---|
File | File in XML or DSX format that includes the operational job metadata. Verify that the file size is not larger than 250 MB. |
The following table describes the Additional and Advanced properties for source metadata settings on the Metadata Load Settings tab:
Property | Description |
|---|
Enable Source Metadata | Extracts metadata from the data source. |
Variable Values File | Select the file that includes all the parameter values required by the mappings or jobs present in the IBM InfoSphere DataStage DSX or XML file. |
Auto Assign Connections | Specifies that the connections to the resource must be assigned automatically. |
Enable Reference Resources | Extracts metadata about assets that are not included in this resource, but referred to in the resource. Examples include source and target tables in PowerCenter mappings, and source tables and files from Tableau reports. |
Retain Unresolved Reference Assets | Retains unresolved reference assets in the catalog after you assign connections. If you retain unresolved reference assets, you can view the complete lineage. The unresolved assets include deleted files, temporary tables, and other assets that are not present in the primary resource. |
Detailed Lineage | Extracts and ingests metadata related to transformation logic for assets that include transformations or stages. A transformation or stage indicates generation, modification, or passage of data between source and target connections. A transformation logic displays the mappings or data-flow-relation types between source assets and target assets related to the asset you select in Enterprise Data Catalog. |
Memory | The memory value required to run a scanner job. Specify one of the following memory values: Note: For details about the memory values, see the Tuning Enterprise Data Catalog Performance How-To Library article. |
JVM Options | JVM parameters that you can set to configure scanner container. Use the following arguments to configure the parameters: - - Dscannerloglevel=<DEBUG/INFO/ERROR>. Changes the log level of scanner to specific values, such as DEBUG, ERROR, or INFO. Default value is INFO.
- - Dscanner.container.core=<No. of core>. Increases the core for the scanner container. The value must be a number.
- - Dscanner.yarn.app.environment=<key=value>. Key pair value that you need to set in the YARN environment. Use a comma to separate multiple key-pair values.
- - Dscanner.pmem.enabled.container.memory.jvm.memory.ratio=<1.0/2.0>. Increases the scanner container memory when pmem is enabled. Default value is 1.
|
Guidelines for Configuring Parameters and Variables
The following conditions apply when you use the IBM InfoSphere DataStage resource to extract metadata:
- •The resource supports parameter value substitution when you use parameters at the connection level.
- •The parameter file does not support any syntax to isolate substitution parameters that are named identically by the mappings.
- •The entries in the Variable Values File property must be in the following format:
variable1_name=variable1_value
variable2_name=variable2_value
...
variableN_name=variableN_value
- •IBM InfoSphere DataStage uses substitution variables in parameters used in connections. If you do not provide a variable value, the resource prints the could not determine the value of a variable warning message in the log file when you run the resource. The resource does not use a substitution for the variable name in the model.
- • You must define the parameter sets in the following format for the resource:
parameterset_name.$parameter1_name=parameter1_value
- •The variable names are not case sensitive, and the resource removes any leading or trailing white space characters.
Limitations of the Resource
The following are the limitations of the IBM InfoSphere DataStage resource:
- •The resource extracts metadata only from XML or DSX files.
- •The resource does not support container transformations.