Release Guide > Part II: 10.2 > Changes (10.2) > Big Data
  

Big Data

This section describes the changes to big data in 10.2.

Hadoop Connection

Effective in version 10.2, the following changes affect Hadoop connection properties.
You can use the following properties to configure your Hadoop connection:
Property
Description
Cluster Configuration
The name of the cluster configuration associated with the Hadoop environment.
Appears in General Properties.
Write Reject Files to Hadoop
Select the property to move the reject files to the HDFS location listed in the property Reject File Directory when you run mappings.
Appears in Reject Directory Properties.
Reject File Directory
The directory for Hadoop mapping files on HDFS when you run mappings.
Appears in Reject Directory Properties
Blaze Job Monitor Address
The host name and port number for the Blaze Job Monitor.
Appears in Blaze Configuration.
YARN Queue Name
The YARN scheduler queue name used by the Spark engine that specifies available resources on a cluster.
Appears in Blaze Configuration.
Effective in version 10.2, the following properties are renamed:
Current Name
Previous Name
Description
ImpersonationUserName
HiveUserName
Hadoop impersonation user. The user name that the Data Integration Service impersonates to run mappings in the Hadoop environment.
Hive Staging Database Name
Database Name
Namespace for Hive staging tables.
Appears in Common Properties.
Previously appeared in Hive Properties.
HiveWarehouseDirectory
HiveWarehouseDirectoryOnHDFS
The absolute HDFS file path of the default database for the warehouse that is local to the cluster.
Blaze Staging Directory
Temporary Working Directory on HDFS
CadiWorkingDirectory
The HDFS file path of the directory that the Blaze engine uses to store temporary files.
Appears in Blaze Configuration.
Blaze User Name
Blaze Service User Name
CadiUserName
The owner of the Blaze service and Blaze service logs.
Appears in Blaze Configuration.
YARN Queue Name
Yarn Queue Name
CadiAppYarnQueueName
The YARN scheduler queue name used by the Blaze engine that specifies available resources on a cluster.
Appears in Blaze Configuration.
BlazeMaxPort
CadiMaxPort
The maximum value for the port number range for the Blaze engine.
BlazeMinPort
CadiMinPort
The minimum value for the port number range for the Blaze engine.
BlazeExecutionParameterList
CadiExecutionParameterList
An optional list of configuration parameters to apply to the Blaze engine.
SparkYarnQueueName
YarnQueueName
The YARN scheduler queue name used by the Spark engine that specifies available resources on a cluster.
Spark Staging Directory
Spark HDFS Staging Directory
The HDFS file path of the directory that the Spark engine uses to store temporary files for running jobs.
Effective in version 10.2, the following properties are removed from the connection and imported into the cluster configuration:
Property
Description
Resource Manager Address
The service within Hadoop that submits requests for resources or spawns YARN applications.
Imported into the cluster configuration as the property yarn.resourcemanager.address.
Previously appeared in Hadoop Cluster Properties.
Default File System URI
The URI to access the default Hadoop Distributed File System.
Imported into the cluster configuration as the property fs.defaultFS or fs.default.name.
Previously appeared in Hadoop Cluster Properties.
Effective in version 10.2, the following properties are deprecated and are removed from the connection:
Property
Description
Type
The connection type.
Previously appeared in General Properties.
Metastore Execution Mode*
Controls whether to connect to a remote metastore or a local metastore.
Previously appeared in Hive Configuration.
Metastore Database URI*
The JDBC connection URI used to access the data store in a local metastore setup.
Previously appeared in Hive Configuration.
Metastore Database Driver*
Driver class name for the JDBC data store.
Previously appeared in Hive Configuration.
Metastore Database User Name*
The metastore database user name.
Previously appeared in Hive Configuration.
Metastore Database Password*
The password for the metastore user name.
Previously appeared in Hive Configuration.
Remote Metastore URI*
The metastore URI used to access metadata in a remote metastore setup.
This property is imported into the cluster configuration as the property hive.metastore.uris.
Previously appeared in Hive Configuration.
Job Monitoring URL
The URL for the MapReduce JobHistory server.
Previously appeared in Hive Configuration.
* These properties are deprecated in 10.2. When you upgrade to 10.2, the property values that you set in a previous release are saved in the repository, but they do not appear in the connection properties.

HBase Connection Properties

Effective in version 10.2, the following properties are removed from the connection and imported into the cluster configuration:
Property
Description
ZooKeeper Host(s)
Name of the machine that hosts the ZooKeeper server.
ZooKeeper Port
Port number of the machine that hosts the ZooKeeper server.
Enable Kerberos Connection
Enables the Informatica domain to communicate with the HBase master server or region server that uses Kerberos authentication.
HBase Master Principal
Service Principal Name (SPN) of the HBase master server.
HBase Region Server Principal
Service Principal Name (SPN) of the HBase region server.

Hive Connection Properties

Effective in version 10.2, PowerExchange for Hive has the following changes:

HBase Connection Properties for MapR-DB

Effective in version 10.2, the Enable Kerberos Connection property is removed from the HBase connection for MapR-DB and imported into the cluster configuration.

Mapping Run-time Properties

This section lists changes to mapping-run time properties.

Execution Environment

Effective in version 10.2, you can configure the Reject File Directory as a new property in the Hadoop Execution Environment.
Name
Value
Reject File Directory
The directory for Hadoop mapping files on HDFS when you run mappings in the Hadoop environment.
The Blaze engine can write reject files to the Hadoop environment for flat file, HDFS, and Hive targets. The Spark and Hive engines can write reject files to the Hadoop environment for flat file and HDFS targets.
Choose one of the following options:
  • - On the Data Integration Service machine. The Data Integration Service stores the reject files based on the RejectDir system parameter.
  • - On the Hadoop Cluster. The reject files are moved to the reject directory configured in the Hadoop connection. If the directory is not configured, the mapping will fail.
  • - Defer to the Hadoop Connection. The reject files are moved based on whether the reject directory is enabled in the Hadoop connection properties. If the reject directory is enabled, the reject files are moved to the reject directory configured in the Hadoop connection. Otherwise, the Data Integration Service stores the reject files based on the RejectDir system parameter.

Monitoring

Effective in version 10.2, the AllHiveSourceTables row in the Summary Statistics view in the Administrator tool includes records read from the following sources:
If the LDTM session includes one MapReduce job, the AllHiveSourceTables statistic only includes original Hive sources in the mapping.
For more information, see the "Monitoring Mappings in the Hadoop Environment" chapter of the Big Data Management 10.2 User Guide.

S3 Access and Secret Key Properties

Effective in version 10.2, the following properties are included in the list of sensitive properties of a cluster configuration:
Sensitive properties are included but masked when you generate a cluster configuration archive file to deploy on the machine that runs the Developer tool.
Previously, you configured these properties in .xml configuration files on the machines that run the Data Integration Service and the Developer tool.
For more information about sensitive properties, see the Informatica Big Data Management 10.2 Administrator Guide.

Sqoop

Effective in version 10.2, if you create a password file to access a database, Sqoop ignores the password file. Sqoop uses the value that you configure in the Password field of the JDBC connection.
Previously, you could create a password file to access a database.
For more information, see the "Mapping Objects in the Hadoop Environment" chapter in the Informatica Big Data Management 10.2 User Guide.