Big Data
This section describes new big data features in version 10.1.
Hadoop Ecosystem
Support in Big Data Management 10.1
Effective in version 10.1, Informatica supports the following updated versions of Hadoop distrbutions:
- •Azure HDInsight 3.3
- •Cloudera CDH 5.5
- •MapR 5.1
For the full list of Hadoop distributions that Big Data Management 10.1 supports, see the Informatica Big Data Management 10.1 Installation and Configuration Guide.
Hadoop Security Systems
Effective in version 10.1, Informatica supports the following security systems on the Hadoop ecosystem:
- •Apache Knox
- •Apache Ranger
- •Apache Sentry
- •HDFS Transparent Encryption
Limitations apply to some combinations of security system and Hadoop distribution platform. For more information on Informatica support for these technologies, see the Informatica Big Data Management 10.1 Security Guide.
Spark Runtime Engine
Effective in version 10.1, you can push mappings to the Apache Spark engine in the Hadoop environment.
Spark is an Apache project with a run-time engine that can run mappings on the Hadoop cluster. Configure the Hadoop connection properties specific to the Spark engine. After you create the mapping, you can validate it and view the execution plan in the same way as the Blaze and Hive engines.
When you push mapping logic to the Spark engine, the Data Integration Service generates a Scala program and packages it into an application. It sends the application to the Spark executor that submits it to the Resource Manager on the Hadoop cluster. The Resource Manager identifies resources to run the application. You can monitor the job in the Administrator tool.
For more information about using Spark to run mappings, see the Informatica Big Data Management 10.1 User Guide.
Sqoop Connectivity for Relational Sources and Targets
Effective in version 10.1, you can use Sqoop to process data between relational databases and HDFS through MapReduce programs. You can use Sqoop to import and export data. When you use Sqoop, you do not need to install the relational database client and software on any node in the Hadoop cluster.
To use Sqoop, you must configure Sqoop properties in a JDBC connection and run the mapping in the Hadoop environment. You can configure Sqoop connectivity for relational data objects, customized data objects, and logical data objects that are based on a JDBC-compliant database. For example, you can configure Sqoop connectivity for the following databases:
- •Aurora
- •IBM DB2
- •IBM DB2 for z/OS
- •Greenplum
- •Microsoft SQL Server
- •Netezza
- •Oracle
- •Teradata
You can also run a profile on data objects that use Sqoop in the Hive run-time environment.
For more information, see the Informatica 10.1 Big Data Management User Guide.
Transformation Support on the Blaze Engine
Effective in version 10.1, the following transformations are supported on the Blaze engine:
- •Address Validator
- •Case Converter
- •Comparison
- •Consolidation
- •Data Processor
- •Decision
- •Key Generator
- •Labeler
- •Match
- •Merge
- •Normalizer
- •Parser
- •Sequence Generator
- •Standardizer
- •Weighted Average
The Address Validator, Consolidation, Data Processor, Match, and Sequence Generator transformations are supported with restrictions.
Effective in version 10.1, the following transformations have additional support on the Blaze engine:
- •Aggregator. Supports pass-through ports.
- •Lookup. Supports unconnected Lookup transformation.
For more information, see the "Mapping Objects in a Hadoop Environment" chapter in the Informatica Big Data Management 10.1 User Guide.