Cluster Configuration Overview
A cluster configuration is an object in the domain that contains configuration information about the Hadoop cluster. The cluster configuration enables the Data Integration Service to push mapping logic to the Hadoop environment.
Import configuration properties from the Hadoop cluster to create a cluster configuration. You can import directly from the cluster or from an archive file that the Hadoop administrator creates. When you perform the import, the cluster configuration wizard can create Hadoop, HBase, HDFS, and Hive connections to access the Hadoop environment. If you choose to create the connections, the wizard also associates the configuration object with the connections.
The cluster configuration displays properties in configuration sets that are based on *-site.xml files on the cluster. You can override the property values, and you can create user-defined properties based on your requirements. When property values change on the cluster, you can refresh the cluster configuration, either directly from the cluster or from an archive file.
The cluster configuration contains properties for the Hadoop cluster type and version. You can edit the version property to any supported version. After you change the version, you must restart the Data Integration Service.
Consider the following high-level process to manage cluster configurations:
- 1. Import the cluster configuration. Import properties associated with *-site.xml files from the cluster. Choose to create connections that require the cluster configuration.
- 2. Edit the cluster configuration. Override imported property values and add user-defined properties.
- 3. Generate a .zip archive from the cluster configuration to save the cluster configuration.
- 4. Refresh the cluster configuration. When property values change on the cluster, refresh the cluster configuration to import the changes.