Verify System Requirements
Amazon EMR, Azure HDInsight, Cloudera CDH, Hortonworks HDP, IBM BigInsights, MapR
Verify that your environment meets the minimum system requirements for the installation process, disk space requirements, port availability, and third-party software.
For more information about product requirements and supported platforms, see the Product Availability Matrix on Informatica Network:
https://network.informatica.com/community/informatica-network/product-availability-matrices/overviewVerify Product Installations
Amazon EMR, Azure HDInsight, Cloudera CDH, Hortonworks HDP, IBM BigInsights, MapR
Before you install Big Data Management, verify that Informatica and third-party products are installed.
You must install the following products:
- Informatica domain and clients
- Verify that the Informatica domain and the Developer tool are 10.1.1 HotFix 1. The Informatica domain must have a Model Repository Service and a Data Integration Service.
- Hadoop File System and MapReduce
- Verify that Hadoop is installed with Hadoop File System (HDFS) and MapReduce on each node. Install Hadoop in a single node environment or in a cluster. The Hadoop installation must include a Hive data warehouse with a non-embedded database for the Hive metastore. For more information, see the Apache website: http://hadoop.apache.org.
- Database client software
- Install the database client software to perform database read and write operations in native mode. Informatica requires the client software to run MapReduce jobs. For example, install the Oracle client to connect to an Oracle database.
Verify the Hadoop Distribution and Installation Package
Amazon EMR, Azure HDInsight, Cloudera CDH, Hortonworks HDP, IBM BigInsights, MapR
Informatica distributes the installation package to the Hadoop cluster through a download package. Identify the distribution that you use and the required download package.
The following table lists the distribution versions and download packages for Big Data Management:
Distribution | Version | Package |
---|
Amazon EMR | 5.0, 5.4 Note: Big Data Management supports Amazon EMR 5.4. To enable support for Amazon EMR 5.4, apply EBF-9585. When you apply the EBF, you disable support for Amazon EMR 5.0. | Red Hat Package Manager (RPM) |
Azure HDInsight | 3.5 | Debian |
Cloudera CDH | 5.8, 5.9, 5.10, 5.11 | Cloudera Parcel |
Hortonworks HDP | 2.3, 2.4, 2.5, 2.6 | Ambari stack |
IBM BigInsights | 4.2 | Red Hat Package Manager (RPM) |
MapR | 5.2 (MEP 1.0) | Red Hat Package Manager (RPM) |
Verify Installer Requirements
Amazon EMR, Azure HDInsight, Cloudera CDH, Hortonworks HDP, IBM BigInsights, MapR
Verify the following installer requirements before performing an installation:
- •Verify that the user who performs the installation can run sudo commands or has user root privileges.
- •Verify that the temporary folder on the local node has at least 2 GB of disk space.
- •If you plan to install on a cluster, verify the connections to the Hadoop cluster nodes.
- •If you plan to install with an RPM or Debian package, configure a Secure Shell (SSH) connection between the machine that runs the Big Data Management installation and the nodes in the Hadoop cluster. Configure passwordless SSH for the root user.
- •If you are upgrading Big Data Management, verify that the destination directory for Informatica binary files is empty. Files that remain from previous installations can cause conflicts that lead to mapping run failures.
Verify Port Requirements
Amazon EMR, Azure HDInsight, Cloudera CDH, Hortonworks HDP, IBM BigInsights, MapR
Open a range of ports to enable the Informatica domain to communicate with the Hadoop cluster and the distribution engine. If the Hadoop cluster is behind a firewall, work with your network administrator to open the range of ports that a distribution engine uses.
The following table lists the ports to open:
Port | Description |
---|
8020 | NameNode RPC. Required for Amazon EMR and Azure HDInsight. |
8032 | ResourceManager. Required for Amazon EMR and Azure HDInsight |
8080 | NameNode API. Required for Amazon EMR and Azure HDInsight |
8088 | Debugging port. Optional for all distributions. |
9080 | Blaze monitoring console. Required for all distributions. |
9083 | Hive metastore. Required for Amazon EMR and Azure HDInsight |
12300 to 12600 | Port range for the Blaze distribution engine. Required for all distributions. |
19888 | Debugging port. Optional for all distributions. |
50070 | Debugging port. Optional for all distributions. |
Verify Hortonworks HDP Requirements
Hortonworks HDP
Before you install Big Data Management in the Ambari stack, verify the following prerequisites:
Verify Azure HDInsight Requirements
Azure HDInsight
To ensure that Informatica can access the HDInsight cluster, edit the /etc/hosts file on the machine that hosts the Data Integration Service to add the following information:
- •Enter the IP address, DNS name, and DNS short name for each data node on the cluster. Use headnodehost to identify the host as the cluster headnode host.
For example:
10.75.169.19 hn0-rndhdi.grg2yxlb0aouniiuvfp3bet13d.ix.internal.cloudapp.net headnodehost
- •If the HDInsight cluster is integrated with ADLS storage, you also need to enter the IP addresses and DNS names for the hosts listed in the cluster property fs.azure.datalake.token.provider.service.urls.
For example:
1.2.3.67 gw1-ltsa.1320suh5npyudotcgaz0izgnhe.gx.internal.cloudapp.net
1.2.3.68 gw0-ltsa.1320suh5npyudotcgaz0izgnhe.gx.internal.cloudapp.net
Note: To get the IP addresses, run a telnet command from the cluster host using each host name found in the fs.azure.datalake.token.provider.service.urls property.