Big Data
This section describes new big data features in version 10.0.
Big Data Management Configuration Utility
Effective in verison 10.0, you can use the Big Data Management Configuration Utility to automate part of the configuration process for Big Data Management.
For more information, see the Informatica 10.0 Big Data Management Installation and Configuration Guide.
Hadoop Connection
Effective in version 10.0, you must configure a Hadoop connection when you run a mapping in the Hadoop environment. You can edit the Hadoop connection to configure run-time properties for the Hadoop environment. The run-time properties include properties for the Hive and Blaze engines.
The following image shows the Hadoop connection as a cluster type connection:
For more information, see the "Connections" chapter in the Informatica 10.0 Big Data Management User Guide.
Hadoop Ecosystem
Effective in version 10.0, Informatica supports the following big data features and enhancements for the Hadoop ecosystem:
- Hadoop clusters on Amazon EC2
- You can read data from and write data to Hortonworks HDP clusters that are deployed on Amazon EC2.
- Hadoop clusters on Microsoft Azure
- You can read data from and write data to Cloudera CDH or Hortonworks HDP clusters that are deployed on Microsoft Azure.
- Hadoop distributions
- You can connect to Hadoop clusters that run the following Hadoop distributions:
- - Cloudera CDH 5.4
- - MapR 4.0.2 with MapReduce 1 and MapReduce 2
- - Hortonworks HDP 2.3
- Hive on Tez
- You can use Hive on Tez as the execution engine for Hadoop clusters that run Hortonworks HDP.
- Kerberos Authentication
- You can use Microsoft Active Directory as the key distribution center for Cloudera CDH and Hortonworks HDP Hadoop clusters.
Parameters for Big Data
Effective in version 10.0, you can use parameters to represent the following additional properties for big data:
- •Complex file sources and targets
- •Complex file sources and targets on HDFS
- •Flat file sources and targets on HDFS
- •HBase sources and targets
- •Hive sources
- •Run-time environment
For more information, see the "Mappings in a Hadoop Environment" chapter in the Informatica 10.0 Big Data Management User Guide.
Run-Time and Validation Environments
Effective in version 10.0, you can select the Hadoop environment to run mappings on the Hadoop cluster. When you select the Hadoop environment, you can also select the Hive or Blaze engine to push the mapping logic to the Hadoop cluster. The Blaze engine is an Informatica proprietary engine for distributed processing on Hadoop.
When you run a mapping in the Hadoop environment, you must configure a Hadoop connection for the mapping. Validate the mapping to ensure that you can push the mapping logic to Hadoop. After you validate a mapping for the Hadoop environment, you can run the mapping.
The following image shows the Hadoop run-time and validation environments:
For more information, see the "Mappings in a Hadoop Environment" chapter in the Informatica 10.0 Big Data Management User Guide.