Release Guide > Part II: 10.2 > Changes (10.2) > Big Data

Big Data

This section describes the changes to big data in 10.2.

Hadoop Connection

Effective in version 10.2, the following changes affect Hadoop connection properties.

You can use the following properties to configure your Hadoop connection:

Property	Description
Cluster Configuration	The name of the cluster configuration associated with the Hadoop environment. Appears in General Properties.
Write Reject Files to Hadoop	Select the property to move the reject files to the HDFS location listed in the property Reject File Directory when you run mappings. Appears in Reject Directory Properties.
Reject File Directory	The directory for Hadoop mapping files on HDFS when you run mappings. Appears in Reject Directory Properties
Blaze Job Monitor Address	The host name and port number for the Blaze Job Monitor. Appears in Blaze Configuration.
YARN Queue Name	The YARN scheduler queue name used by the Spark engine that specifies available resources on a cluster. Appears in Blaze Configuration.

Effective in version 10.2, the following properties are renamed:

Current Name	Previous Name	Description
ImpersonationUserName	HiveUserName	Hadoop impersonation user. The user name that the Data Integration Service impersonates to run mappings in the Hadoop environment.
Hive Staging Database Name	Database Name	Namespace for Hive staging tables. Appears in Common Properties. Previously appeared in Hive Properties.
HiveWarehouseDirectory	HiveWarehouseDirectoryOnHDFS	The absolute HDFS file path of the default database for the warehouse that is local to the cluster.
Blaze Staging Directory	Temporary Working Directory on HDFS CadiWorkingDirectory	The HDFS file path of the directory that the Blaze engine uses to store temporary files. Appears in Blaze Configuration.
Blaze User Name	Blaze Service User Name CadiUserName	The owner of the Blaze service and Blaze service logs. Appears in Blaze Configuration.
YARN Queue Name	Yarn Queue Name CadiAppYarnQueueName	The YARN scheduler queue name used by the Blaze engine that specifies available resources on a cluster. Appears in Blaze Configuration.
BlazeMaxPort	CadiMaxPort	The maximum value for the port number range for the Blaze engine.
BlazeMinPort	CadiMinPort	The minimum value for the port number range for the Blaze engine.
BlazeExecutionParameterList	CadiExecutionParameterList	An optional list of configuration parameters to apply to the Blaze engine.
SparkYarnQueueName	YarnQueueName	The YARN scheduler queue name used by the Spark engine that specifies available resources on a cluster.
Spark Staging Directory	Spark HDFS Staging Directory	The HDFS file path of the directory that the Spark engine uses to store temporary files for running jobs.

Effective in version 10.2, the following properties are removed from the connection and imported into the cluster configuration:

Property	Description
Resource Manager Address	The service within Hadoop that submits requests for resources or spawns YARN applications. Imported into the cluster configuration as the property yarn.resourcemanager.address. Previously appeared in Hadoop Cluster Properties.
Default File System URI	The URI to access the default Hadoop Distributed File System. Imported into the cluster configuration as the property fs.defaultFS or fs.default.name. Previously appeared in Hadoop Cluster Properties.

Effective in version 10.2, the following properties are deprecated and are removed from the connection:

Property	Description
Type	The connection type. Previously appeared in General Properties.
Metastore Execution Mode*	Controls whether to connect to a remote metastore or a local metastore. Previously appeared in Hive Configuration.
Metastore Database URI*	The JDBC connection URI used to access the data store in a local metastore setup. Previously appeared in Hive Configuration.
Metastore Database Driver*	Driver class name for the JDBC data store. Previously appeared in Hive Configuration.
Metastore Database User Name*	The metastore database user name. Previously appeared in Hive Configuration.
Metastore Database Password*	The password for the metastore user name. Previously appeared in Hive Configuration.
Remote Metastore URI*	The metastore URI used to access metadata in a remote metastore setup. This property is imported into the cluster configuration as the property hive.metastore.uris. Previously appeared in Hive Configuration.
Job Monitoring URL	The URL for the MapReduce JobHistory server. Previously appeared in Hive Configuration.
* These properties are deprecated in 10.2. When you upgrade to 10.2, the property values that you set in a previous release are saved in the repository, but they do not appear in the connection properties.

HBase Connection Properties

Effective in version 10.2, the following properties are removed from the connection and imported into the cluster configuration:

Property	Description
ZooKeeper Host(s)	Name of the machine that hosts the ZooKeeper server.
ZooKeeper Port	Port number of the machine that hosts the ZooKeeper server.
Enable Kerberos Connection	Enables the Informatica domain to communicate with the HBase master server or region server that uses Kerberos authentication.
HBase Master Principal	Service Principal Name (SPN) of the HBase master server.
HBase Region Server Principal	Service Principal Name (SPN) of the HBase region server.

Hive Connection Properties

Effective in version 10.2, PowerExchange for Hive has the following changes:

•You cannot use a PowerExchange for Hive connection if you want the Hive driver to run mappings in the Hadoop cluster. To use the Hive driver to run mappings in the Hadoop cluster, use a Hadoop connection.
•The following properties are removed from the connection and imported into the cluster configuration:

Property	Description
Default FS URI	The URI to access the default Hadoop Distributed File System.
JobTracker/Yarn Resource Manager URI	The service within Hadoop that submits the MapReduce tasks to specific nodes in the cluster.
Hive Warehouse Directory on HDFS	The absolute HDFS file path of the default database for the warehouse that is local to the cluster.
Metastore Execution Mode	Controls whether to connect to a remote metastore or a local metastore.
Metastore Database URI	The JDBC connection URI used to access the data store in a local metastore setup.
Metastore Database Driver	Driver class name for the JDBC data store.
Metastore Database User Name	The metastore database user name.
Metastore Database Password	The password for the metastore user name.
Remote Metastore URI	The metastore URI used to access metadata in a remote metastore setup. This property is imported into the cluster configuration as the property hive.metastore.uris.

HBase Connection Properties for MapR-DB

Effective in version 10.2, the Enable Kerberos Connection property is removed from the HBase connection for MapR-DB and imported into the cluster configuration.

Mapping Run-time Properties

This section lists changes to mapping-run time properties.

Execution Environment

Effective in version 10.2, you can configure the Reject File Directory as a new property in the Hadoop Execution Environment.

Name	Value
Reject File Directory	The directory for Hadoop mapping files on HDFS when you run mappings in the Hadoop environment. The Blaze engine can write reject files to the Hadoop environment for flat file, HDFS, and Hive targets. The Spark and Hive engines can write reject files to the Hadoop environment for flat file and HDFS targets. Choose one of the following options: - On the Data Integration Service machine. The Data Integration Service stores the reject files based on the RejectDir system parameter. - On the Hadoop Cluster. The reject files are moved to the reject directory configured in the Hadoop connection. If the directory is not configured, the mapping will fail. - Defer to the Hadoop Connection. The reject files are moved based on whether the reject directory is enabled in the Hadoop connection properties. If the reject directory is enabled, the reject files are moved to the reject directory configured in the Hadoop connection. Otherwise, the Data Integration Service stores the reject files based on the RejectDir system parameter.

Name

Value

Reject File Directory

The directory for Hadoop mapping files on HDFS when you run mappings in the Hadoop environment.

The Blaze engine can write reject files to the Hadoop environment for flat file, HDFS, and Hive targets. The Spark and Hive engines can write reject files to the Hadoop environment for flat file and HDFS targets.

Choose one of the following options:

- On the Data Integration Service machine. The Data Integration Service stores the reject files based on the RejectDir system parameter.
- On the Hadoop Cluster. The reject files are moved to the reject directory configured in the Hadoop connection. If the directory is not configured, the mapping will fail.
- Defer to the Hadoop Connection. The reject files are moved based on whether the reject directory is enabled in the Hadoop connection properties. If the reject directory is enabled, the reject files are moved to the reject directory configured in the Hadoop connection. Otherwise, the Data Integration Service stores the reject files based on the RejectDir system parameter.

Monitoring

Effective in version 10.2, the AllHiveSourceTables row in the Summary Statistics view in the Administrator tool includes records read from the following sources:

•Original Hive sources in the mapping.
•Staging Hive tables defined by the Hive engine.
•Staging data between two linked MapReduce jobs in each query.

If the LDTM session includes one MapReduce job, the AllHiveSourceTables statistic only includes original Hive sources in the mapping.

For more information, see the "Monitoring Mappings in the Hadoop Environment" chapter of the Big Data Management 10.2 User Guide.

S3 Access and Secret Key Properties

Effective in version 10.2, the following properties are included in the list of sensitive properties of a cluster configuration:

•fs.s3a.access.key
•fs.s3a.secret.key
•fs.s3n.awsAccessKeyId
•fs.s3n.awsSecretAccessKey
•fs.s3.awsAccessKeyId
•fs.s3.awsSecretAccessKey

Sensitive properties are included but masked when you generate a cluster configuration archive file to deploy on the machine that runs the Developer tool.

Previously, you configured these properties in .xml configuration files on the machines that run the Data Integration Service and the Developer tool.

For more information about sensitive properties, see the Informatica Big Data Management 10.2 Administrator Guide.

Sqoop

Effective in version 10.2, if you create a password file to access a database, Sqoop ignores the password file. Sqoop uses the value that you configure in the Password field of the JDBC connection.

Previously, you could create a password file to access a database.

For more information, see the "Mapping Objects in the Hadoop Environment" chapter in the Informatica Big Data Management 10.2 User Guide.