Enterprise Data Catalog Scanner Configuration Guide > Configuring Data Integration Resources > IBM InfoSphere DataStage

IBM InfoSphere DataStage

IBM® InfoSphere® DataStage® is an ETL platform that is used to integrate data across multiple systems.

Objects Extracted

The IBM InfoSphere DataStage resource supports the following source and target data sources included in ETL mappings in IBM InfoSphere DataStage:

•Oracle
•Teradata
•Flat files

The IBM InfoSphere DataStage resource extracts mappings for server and parallel jobs from the DSX or XML file. If a DSX or XML file includes sequence and standalone jobs, the resource extracts only the server and parallel jobs linked to the sequence jobs.

The resource extracts metadata for the following types of transformations in the mappings:

•Aggregator*
•Filter*
•Join*
•Funnel*
•Sort*
•Lookup*
•Merge*
•Remove Duplicates*
•Difference*
•Copy
•Peek
•SQL Transformer
•Basic Transformer
•Change Capture*
•Pivot
•Switch
•XML Output

Note: * Indicates the transformations for which the resource displays properties in the transformation logic in the Lineage and Impact tab.

Resource Connection Properties

The following table describes the resource connection properties:

Property	Description
File	File in XML or DSX format that includes the operational job metadata. Verify that the file size is not larger than 250 MB.

The following table describes the Additional and Advanced properties for source metadata settings on the Metadata Load Settings tab:

Property	Description
Enable Source Metadata	Extracts metadata from the data source.
Variable Values File	Select the file that includes all the parameter values required by the mappings or jobs present in the IBM InfoSphere DataStage DSX or XML file.
Auto Assign Connections	Specifies that the connections to the resource must be assigned automatically.
Enable Reference Resources	Extracts metadata about assets that are not included in this resource, but referred to in the resource. Examples include source and target tables in PowerCenter mappings, and source tables and files from Tableau reports.
Retain Unresolved Reference Assets	Retains unresolved reference assets in the catalog after you assign connections. If you retain unresolved reference assets, you can view the complete lineage. The unresolved assets include deleted files, temporary tables, and other assets that are not present in the primary resource.
Detailed Lineage	Extracts and ingests metadata related to transformation logic for assets that include transformations or stages. A transformation or stage indicates generation, modification, or passage of data between source and target connections. A transformation logic displays the mappings or data-flow-relation types between source assets and target assets related to the asset you select in Enterprise Data Catalog.
Memory	The memory value required to run a scanner job. Specify one of the following memory values: - Low - Medium - High Note: For details about the memory values, see the Tuning Enterprise Data Catalog Performance How-To Library article.
JVM Options	JVM parameters that you can set to configure scanner container. Use the following arguments to configure the parameters: - Dscannerloglevel=<DEBUG/INFO/ERROR>. Changes the log level of scanner to specific values, such as DEBUG, ERROR, or INFO. Default value is INFO. - Dscanner.container.core=<No. of core>. Increases the core for the scanner container. The value must be a number. - Dscanner.yarn.app.environment=<key=value>. Key pair value that you need to set in the YARN environment. Use a comma to separate multiple key-pair values. - Dscanner.pmem.enabled.container.memory.jvm.memory.ratio=<1.0/2.0>. Increases the scanner container memory when pmem is enabled. Default value is 1.

Guidelines for Configuring Parameters and Variables

The following conditions apply when you use the IBM InfoSphere DataStage resource to extract metadata:

•The resource supports parameter value substitution when you use parameters at the connection level.
•The parameter file does not support any syntax to isolate substitution parameters that are named identically by the mappings.
•The entries in the Variable Values File property must be in the following format:

variable1_name=variable1_value

variable2_name=variable2_value

...

variableN_name=variableN_value

•IBM InfoSphere DataStage uses substitution variables in parameters used in connections. If you do not provide a variable value, the resource prints the could not determine the value of a variable warning message in the log file when you run the resource. The resource does not use a substitution for the variable name in the model.
• You must define the parameter sets in the following format for the resource:

parameterset_name.$parameter1_name=parameter1_value

•The variable names are not case sensitive, and the resource removes any leading or trailing white space characters.

Limitations of the Resource

The following are the limitations of the IBM InfoSphere DataStage resource:

•The resource extracts metadata only from XML or DSX files.
•The resource does not support container transformations.