Release Guide > New Features (10.0) > Big Data

Big Data

This section describes new big data features in version 10.0.

Big Data Management Configuration Utility

Effective in verison 10.0, you can use the Big Data Management Configuration Utility to automate part of the configuration process for Big Data Management.

For more information, see the Informatica 10.0 Big Data Management Installation and Configuration Guide.

Hadoop Connection

Effective in version 10.0, you must configure a Hadoop connection when you run a mapping in the Hadoop environment. You can edit the Hadoop connection to configure run-time properties for the Hadoop environment. The run-time properties include properties for the Hive and Blaze engines.

The following image shows the Hadoop connection as a cluster type connection:

The image shows the Preferences screen. Connections is selected under Informatica on the left hand side. The list of available connections appears on the right-hand side. Hadoop is selected under Clusters.

For more information, see the "Connections" chapter in the Informatica 10.0 Big Data Management User Guide.

Hadoop Ecosystem

Effective in version 10.0, Informatica supports the following big data features and enhancements for the Hadoop ecosystem:

Hadoop clusters on Amazon EC2: You can read data from and write data to Hortonworks HDP clusters that are deployed on Amazon EC2.

Hadoop clusters on Microsoft Azure: You can read data from and write data to Cloudera CDH or Hortonworks HDP clusters that are deployed on Microsoft Azure.

Hadoop distributions: You can connect to Hadoop clusters that run the following Hadoop distributions:

Hive on Tez: You can use Hive on Tez as the execution engine for Hadoop clusters that run Hortonworks HDP.
Kerberos Authentication: You can use Microsoft Active Directory as the key distribution center for Cloudera CDH and Hortonworks HDP Hadoop clusters.

Parameters for Big Data

Effective in version 10.0, you can use parameters to represent the following additional properties for big data:

•Complex file sources and targets
•Complex file sources and targets on HDFS
•Flat file sources and targets on HDFS
•HBase sources and targets
•Hive sources
•Run-time environment

For more information, see the "Mappings in a Hadoop Environment" chapter in the Informatica 10.0 Big Data Management User Guide.

Run-Time and Validation Environments

Effective in version 10.0, you can select the Hadoop environment to run mappings on the Hadoop cluster. When you select the Hadoop environment, you can also select the Hive or Blaze engine to push the mapping logic to the Hadoop cluster. The Blaze engine is an Informatica proprietary engine for distributed processing on Hadoop.

When you run a mapping in the Hadoop environment, you must configure a Hadoop connection for the mapping. Validate the mapping to ensure that you can push the mapping logic to Hadoop. After you validate a mapping for the Hadoop environment, you can run the mapping.

The following image shows the Hadoop run-time and validation environments:

The figure shows the mapping Run-time tab. Under Validation Environment, Hadoop is selected. The Hive on MapReduce and Blaze engines are selected by default. Under Execution Environment, Hadoop is selected.

For more information, see the "Mappings in a Hadoop Environment" chapter in the Informatica 10.0 Big Data Management User Guide.