Enterprise Data Catalog Scanner Configuration Guide > Configuring Data Integration Resources > IBM InfoSphere DataStage
  

IBM InfoSphere DataStage

IBM® InfoSphere® DataStage® is an ETL platform that is used to integrate data across multiple systems.

Objects Extracted

The IBM InfoSphere DataStage resource supports the following source and target data sources included in ETL mappings in IBM InfoSphere DataStage:
The IBM InfoSphere DataStage resource extracts mappings for server and parallel jobs from the DSX or XML file. If a DSX or XML file includes sequence and standalone jobs, the resource extracts only the server and parallel jobs linked to the sequence jobs.
The resource extracts metadata for the following types of transformations in the mappings:
Note: * Indicates the transformations for which the resource displays properties in the transformation logic in the Lineage and Impact tab.

Resource Connection Properties

The following table describes the resource connection properties:
Property
Description
File
File in XML or DSX format that includes the operational job metadata. Verify that the file size is not larger than 250 MB.
The following table describes the Additional and Advanced properties for source metadata settings on the Metadata Load Settings tab:
Property
Description
Enable Source Metadata
Extracts metadata from the data source.
Variable Values File
Select the file that includes all the parameter values required by the mappings or jobs present in the IBM InfoSphere DataStage DSX or XML file.
Auto Assign Connections
Specifies that the connections to the resource must be assigned automatically.
Enable Reference Resources
Extracts metadata about assets that are not included in this resource, but referred to in the resource. Examples include source and target tables in PowerCenter mappings, and source tables and files from Tableau reports.
Retain Unresolved Reference Assets
Retains unresolved reference assets in the catalog after you assign connections. If you retain unresolved reference assets, you can view the complete lineage. The unresolved assets include deleted files, temporary tables, and other assets that are not present in the primary resource.
Detailed Lineage
Extracts and ingests metadata related to transformation logic for assets that include transformations or stages. A transformation or stage indicates generation, modification, or passage of data between source and target connections. A transformation logic displays the mappings or data-flow-relation types between source assets and target assets related to the asset you select in Enterprise Data Catalog.
Memory
The memory value required to run a scanner job.
Specify one of the following memory values:
  • - Low
  • - Medium
  • - High
Note: For details about the memory values, see the Tuning Enterprise Data Catalog Performance How-To Library article.
JVM Options
JVM parameters that you can set to configure scanner container. Use the following arguments to configure the parameters:
  • - Dscannerloglevel=<DEBUG/INFO/ERROR>. Changes the log level of scanner to specific values, such as DEBUG, ERROR, or INFO. Default value is INFO.
  • - Dscanner.container.core=<No. of core>. Increases the core for the scanner container. The value must be a number.
  • - Dscanner.yarn.app.environment=<key=value>. Key pair value that you need to set in the YARN environment. Use a comma to separate multiple key-pair values.
  • - Dscanner.pmem.enabled.container.memory.jvm.memory.ratio=<1.0/2.0>. Increases the scanner container memory when pmem is enabled. Default value is 1.

Guidelines for Configuring Parameters and Variables

The following conditions apply when you use the IBM InfoSphere DataStage resource to extract metadata:

Limitations of the Resource

The following are the limitations of the IBM InfoSphere DataStage resource: