Before You Begin
Before you begin the installation, install the Informatica components and PowerExchange® adapters, and perform the pre-installation tasks.
Install and Configure the Informatica Domain and Clients
Before you install Big Data Management, install and configure the Informatica domain and clients.
Run the Informatica services installation to configure the Informatica domain and create the Informatica services. Run the Informatica client installation to install the Informatica client tools.
Install and Configure PowerExchange Adapters
Based on your business needs, install and configure Informatica adapters. Use Big Data Management with Informatica adapters for access to sources and targets.
To run Informatica mappings in a Hadoop environment you must install and configure Informatica adapters.
You can use the following Informatica adapters as part of Big Data Management:
- •PowerExchange for DataSift
- •PowerExchange for Facebook
- •PowerExchange for HBase
- •PowerExchange for HDFS
- •PowerExchange for Hive
- •PowerExchange for LinkedIn
- •PowerExchange for Teradata Parallel Transporter API
- •PowerExchange for Twitter
- •PowerExchange for Web Content-Kapow Katalyst
For more information, see the PowerExchange adapter documentation.
Install and Configure Data Replication
To migrate data with minimal downtime and perform auditing and operational reporting functions, install and configure Data Replication. For information, see the Informatica Data Replication User Guide.
Pre-Installation Tasks for a Single Node Environment
Before you begin the Big Data Management installation in a single node environment, perform the following pre-installation tasks.
- •Verify that Hadoop is installed with Hadoop File System (HDFS) and MapReduce. The Hadoop installation should include a Hive data warehouse that is configured to use a non-embedded database as the MetaStore. For more information, see the Apache website here: http://hadoop.apache.org.
- •To perform both read and write operations in native mode, install the required third-party client software. For example, install the Oracle client to connect to the Oracle database.
- •Verify that the Big Data Management administrator user can run sudo commands or have user root privileges.
- •Verify that the temporary folder on the local node has at least 2 GB of disk space.
- •Verify that the destination directory for Informatica binary files is empty. The presence of files left over from previous installations can cause conflicts between files, leading to mapping run failures.
Pre-Installation Tasks for a Cluster Environment
Before you begin the Big Data Management installation in a cluster environment, perform the following tasks:
- •Install third-party software.
- •Verify system requirements.
- •Verify connection requirements.
Install Third-Party Software
Verify that the following third-party software is installed:
- Hadoop with Hadoop Distributed File System (HDFS) and MapReduce
- Hadoop must be installed on every node within the cluster. The Hadoop installation must include a Hive data warehouse that is configured to use a MySQL database as the MetaStore. You can configure Hive to use a local or remote MetaStore server. For more information, see the Apache website here: http://hadoop.apache.org/.
Note: Informatica does not support embedded MetaStore server setups.
- Database client software to perform read and write operations in native mode
- Install the client software for the database. Informatica requires the client software to run MapReduce jobs. For example, install the Oracle client to connect to the Oracle database.
Verify System Requirements
Verify the following system requirements:
- •The Big Data Management administrator can run sudo commands or has root user privileges.
- •The temporary folder in each of the nodes on which Big Data Management will be installed has at least 2 GB of disk space.
- •The destination directory for Informatica binary files is empty. The presence of files left over from previous installations can cause conflicts between files, leading to mapping run failures.
Verify Connection Requirements
Verify the connection to the Hadoop cluster nodes.
Big Data Management requires a Secure Shell (SSH) connection without a password between the machine where you want to run the Big Data Management installation and all the nodes in the Hadoop cluster. Configure passwordless SSH for the root user.
Note: For security reasons, consider removing the passwordless SSH configuration for the root user when Big Data Management installation and configuration are complete.