What's New and Changed > Part V: Version 10.2.1 > 10.2.1 What's Changed > Big Data Management
  

Big Data Management

This section describes the changes to Big Data Management in version 10.2.1.

Azure Storage Access

Effective in version 10.2.1, you must override the properties in the cluster configuration core-site.xml before you run a mapping on the Azure HDInsight cluster.
WASB
If you use a cluster with WASB as storage, you can get the storage account key associated with the HDInsight cluster from the administrator or you can decrypt the encrypted storage account key, and then override the decrypted value in the cluster configuration core-site.xml.
ADLS
If you use a cluster with ADLS as storage, you must copy the client credentials from the web application, and then override the values in the cluster configuration core-site.xml.
Previously, you copied the files from the Hadoop cluster to the machine that runs the Data Integration Service.

Configuring the Hadoop Distribution

This section describes changes to Hadoop distribution configuration.

Hadoop Distribution Configuration

Effective in version 10.2.1, you configure the Hadoop distribution in cluster configuration properties.
The Distribution Name and Distribution Version properties are populated when you import a cluster configuration from the cluster. You can edit the distribution version after you finish the import process.
Previously, the Hadoop distribution was identified by the path to the distribution directory on the machine that hosts the Data Integration Service.
Effective in version 10.2.1, the following property is removed from the Data Integration Service properties:
For more information about the Distribution Name and Distribution Version properties, see the Big Data Management 10.2.1 Administration Guide.

MapR Configuration

Effective in version 10.2.1, it is no longer necessary to configure Data Integration Service process properties for the domain when you use Big Data Management with MapR. Big Data Management supports Kerberos authentication with no user action necessary.
Previously, you configured JVM Option properties in the Data Integration Service custom properties, as well as environment variables, to enable support for Kerberos authentication.
For more information about integrating the domain with a MapR cluster, see the Big Data Management 10.2.1 Hadoop Integration Guide.

Developer Tool Configuration

Effective in version 10.2.1, you can create a Metadata Access Service. The Metadata Access Service is an application service that allows the Developer tool to access Hadoop connection information to import and preview metadata. When you import an object from a Hadoop cluster, the following adapters use Metadata Access Service to extract the object metadata at design time:
Previously, you performed the following steps manually on each Developer tool to establish communication between the Developer tool machine and Hadoop cluster at design time:
The Metadata Access Service eliminates the need to configure each Developer tool machine for design-time connectivity to Hadoop cluster.
For more information, see the "Metadata Access Service" chapter in the Informatica 10.2.1 Application Service Guide.

Hadoop Connection Changes

Effective in version 10.2.1, the Hadoop connection contains new and different properties and functionality. These include several properties that you previously configured in other connections or configuration files, and other changes.
This section lists changes to the Hadoop connection in version 10.2.1.

Properties Moved from hadoopEnv.properties to the Hadoop Connection

Effective in version 10.2.1, the properties that you previously configured in the hadoopEnv.properties file are now configurable in advanced properties for the Hadoop connection.
For information about Hive and Hadoop connections, see the Informatica Big Data Management 10.2.1 User Guide. For more information about configuring Big Data Management, see the Informatica Big Data Management 10.2.1 Hadoop Integration Guide.

Properties Moved from the Hive Connection to the Hadoop Connection

The following Hive connection properties to enable mappings to run on a Hadoop cluster are now in the Hadoop connection:
Previously, you configured these properties in the Hive connection.
For information about Hive and Hadoop connections, see the Informatica Big Data Management 10.2.1 Administrator Guide.

Advanced Properties for Hadoop Run-time Engines

Effective in version 10.2.1, configure advanced properties for the Blaze, Spark and Hive run-time engines in Hadoop connection properties.
Informatica standardized the property names for run-time engine-related properties. The following table shows the old and new names:
Pre-10.2.1 Property Name
10.2.1 Hadoop Connection Properties Section
10.2.1 Property Name
Blaze Service Custom Properties
Blaze Configuration
Advanced Properties
Spark Execution Parameters
Spark Configuration
Advanced Properties
Hive Custom Properties
Hive Pushdown Configuration
Advanced Properties
Previously, you configured advanced properties for run-time engines in the hadoopRes.properties or hadoopEnv.properties files, or in the Hadoop Engine Custom Properties field under Common Properties in the Administrator tool.

Additional Properties for the Blaze Engine

Effective in version 10.2.1, you can configure an additional property in the Blaze Configuration Properties section of the Hadoop connection properties.
The following table describes the property:
Property
Description
Blaze YARN Node Label
Node label that determines the node on the Hadoop cluster where the Blaze engine runs. If you do not specify a node label, the Blaze engine runs on the nodes in the default partition.
If the Hadoop cluster supports logical operators for node labels, you can specify a list of node labels. To list the node labels, use the operators && (AND), || (OR), and ! (NOT).
For more information on using node labels on the Blaze engine, see the "Mappings in the Hadoop Environment" chapter in the Informatica Big Data Management 10.2.1 User Guide.

Hive Connection Properties

Effective in version 10.2.1, properties for the Hive connection have changed.
The following Hive connection properties have been removed:
Previously, these properties were deprecated. Effective in version 10.2.1, they are obsolete.
Configure the following Hive connection properties in the Hadoop connection:
Previously, you configured these properties in the Hive connection.
For information about Hive and Hadoop connections, see the Informatica Big Data Management 10.2.1 User Guide.

Monitoring

This section describes the changes to monitoring in Big Data Management in version 10.2.1.

Spark Monitoring

Effective in version 10.2.1, changes in Spark monitoring relate to the following areas:

Event Changes

Effective in version 10.2.1, only monitoring information is checked in the Spark events in the session log.
Previously, all the Spark events were relayed as is from the Spark application to the Spark executor. When the events relayed took a long time, performance issues occurred.
For more information, see the Informatica Big Data Management 10.2.1 User Guide.

Summary Statistics View

Effective in version 10.2.1, you can view the statistics for Spark execution based on the run stages. For instance, Spark Run Stages shows the statistics of spark application run stages. Stage_0 shows the statistics related to run stage with ID=0 in the spark application. Rows and Average Rows/Sec show the number of rows written out of the stage and the corresponding throughput. Bytes and Average Bytes/Sec show the bytes and throughput broadcasted in the stage.
Previously, you could only view the Source and Target rows and average rows for each second processed for the Spark run.
For more information, see the Informatica Big Data Management 10.2.1 User Guide.

Precision and Scale on the Hive Engine

Effective in version 10.2.1, the output of user-defined functions that perform multiplication on the Hive engine can have a maximum scale of 6 if the following conditions are true:
Previously, the scale could be as low as 0.
For more information, see the "Mappings in the Hadoop Environment" chapter in the Informatica Big Data Management 10.2.1 User Guide.

Sqoop

Effective in version 10.2.1, the following changes apply to Sqoop:

Transformation Support on the Hive Engine

Effective in version 10.2.1, a Labeler or Parser transformation that performs probabilistic analysis requires the Java 8 Development Kit on any node on which it runs.
Previously, the transformations required the Java 7 Development Kit.
If you run a mapping that contains a Labeler or Parser transformation that you configured for probabilistic analysis, verify the Java version on the Hive nodes.
Note: On a Blaze or Spark node, the Data Integration Service uses the Java Development Kit that installs with the Informatica engine. Informatica 10.2.1 installs with version 8 of the Java Development Kit.
For more information, see the Informatica 10.2.1 Installation Guide or the Informatica 10.2.1 Upgrade Guide that applies to the Informatica version that you upgrade.