Big Data Management Installation from an RPM Package
To install Big Data Management on Amazon EMR or IBM BigInsights, download the tar.gz file that includes an RPM package and the binary files that you need.
You can install Big Data Management in a single node environment. You can also install Big Data Management in a cluster environment from the primary name node or from any machine.
Choose one of the following modes to install Big Data Management on Amazon EMR or IBM BigInsights:
- •Install in a single node environment.
- •Install in a cluster environment from the primary name node using SCP protocol.
- •Install in a cluster environment from the primary name node using NFS protocol.
- •Install in a cluster environment from a non-name node machine.
- •Create a cluster on Amazon EMR and install Big Data Management.
Download the Distribution Package
1. Download the following file to a temporary folder: InformaticaHadoop-<version>.<platform>-x64.tar.gz.
Note: The distribution package must be stored on a local disk and not on HDFS.
2. Extract the file to the machine from where you want to distribute the package and run the Big Data Management installation.
Installing in a Single Node Environment
You can install Big Data Management in a single node environment.
1. Log in to the machine.
2. Run the following command from the Big Data Management root directory to start the installation in console mode:
bash InformaticaHadoopInstall.sh
3. Press y to accept the Big Data Management terms of agreement.
4. Press Enter.
5. Press 1 to install Big Data Management in a single node environment.
6. Press Enter.
7. Type the absolute path for the Big Data Management installation directory and press Enter.
Start the path with a slash. The directory names in the path must not contain spaces or the following special characters: { } ! @ # $ % ^ & * ( ) : ; | ' ` < > , ? + [ ] \
If you type a directory path that does not exist, the installer creates the entire directory path on the node during the installation. Default is /opt.
8. Press Enter.
The installer creates the /<BigDataManagementInstallationDirectory>/Informatica directory and populates all of the file systems with the contents of the RPM package.
To get more information about the tasks performed by the installer, you can view the informatica-hadoop-install.<DateTimeStamp>.log installation log file.
Installing in a Cluster Environment from the Primary Name Node Using SCP Protocol
You can install Big Data Management in a cluster environment from the primary name node using SCP.
1. Log in to the primary name node.
2. Run the following command to start the Big Data Management installation in console mode:
bash InformaticaHadoopInstall.sh
3. Press y to accept the Big Data Management terms of agreement.
4. Press Enter.
5. Press 2 to install Big Data Management in a cluster environment.
6. Press Enter.
7. Type the absolute path for the Big Data Management installation directory.
Start the path with a slash. The directory names in the path must not contain spaces or the following special characters: { } ! @ # $ % ^ & * ( ) : ; | ' ` < > , ? + [ ] \
If you type a directory path that does not exist, the installer creates the entire directory path on each of the nodes during the installation. Default is /opt.
8. Press Enter.
9. Press 1 to install Big Data Management from the primary name node.
10. Press Enter.
11. Type the absolute path for the Hadoop installation directory. Start the path with a slash.
12. Press Enter.
13. Type y.
14. Press Enter.
The installer retrieves a list of DataNodes from the $HADOOP_HOME/conf/slaves file. On each of the DataNodes, the installer creates the Informatica directory and populates all of the file systems with the contents of the RPM package. The Informatica directory is located here: /<BigDataManagementInstallationDirectory>/Informatica
You can view the informatica-hadoop-install.<DateTimeStamp>.log installation log file to get more information about the tasks performed by the installer.
Installing Big Data Management Using NFS
You can install Big Data Management in a cluster environment from the primary name node using NFS protocol.
1. Log in to the primary name node.
2. Run the following command to start the Big Data Management installation in console mode:
bash InformaticaHadoopInstall.sh
3. Press y to accept the Big Data Management terms of agreement.
4. Press Enter.
5. Press 2 to install Big Data Management in a cluster environment.
6. Press Enter.
7. Type the absolute path for the Big Data Management installation directory.
Start the path with a slash. The directory names in the path must not contain spaces or the following special characters: { } ! @ # $ % ^ & * ( ) : ; | ' ` < > , ? + [ ] \
If you type a directory path that does not exist, the installer creates the entire directory path on each of the nodes during the installation. Default is /opt.
8. Press Enter.
9. Press 1 to install Big Data Management from the primary name node.
10. Press Enter.
11. Type the absolute path for the Hadoop installation directory. Start the path with a slash.
12. Press Enter.
13. Type n.
14. Press Enter.
15. Type y.
16. Press Enter.
The installer retrieves a list of DataNodes from the $HADOOP_HOME/conf/slaves file. On each of the DataNodes, the installer creates the /<BigDataManagementInstallationDirectory>/Informatica directory and populates all of the file systems with the contents of the RPM package.
You can view the informatica-hadoop-install.<DateTimeStamp>.log installation log file to get more information about the tasks performed by the installer.
Installing in a Cluster Environment from a Non-Name Node Machine
You can install Big Data Management in a cluster environment from any machine in the cluster that is not a name node.
1. Verify that the Big Data Management administrator has user root privileges on the node that will be running the Big Data Management installation.
2. Log in to the machine as the root user.
3. In the HadoopDataNodes file, add the IP addresses or machine host names of the nodes in the Hadoop cluster on which you want to install Big Data Management. The HadoopDataNodes file is located on the node from where you want to launch the Big Data Management installation. You must add one IP addresses or machine host names of the nodes in the Hadoop cluster for each line in the file.
4. Run the following command to start the Big Data Management installation in console mode:
bash InformaticaHadoopInstall.sh
5. Press y to accept the Big Data Management terms of agreement.
6. Press Enter.
7. Press 2 to install Big Data Management in a cluster environment.
8. Press Enter.
9. Type the absolute path for the Big Data Management installation directory and press Enter. Start the path with a slash. Default is /opt.
10. Press Enter.
11. Press 2 to install Big Data Management using the HadoopDataNodes file.
12. Press Enter.
The installer creates the /<BigDataManagementInstallationDirectory>/Informatica directory and populates all of the file systems with the contents of the RPM package on the first node that appears in the HadoopDataNodes file. The installer repeats the process for each node in the HadoopDataNodes file.
Create a Cluster on Amazon EMR and Install Big Data Management
If you choose not to use one of the standard installation procedures described above, you can create a cluster on Amazon EMR and install Big Data Management.
You upload the RPM package to an S3 bucket, and prepare and upload a bootstrap script. Use the cluster creation wizard to create an Amazon EMR cluster. The cluster creation wizard uses values in the bootstrap script to download the RPM package from the Amazon S3 bucket and extract the package. Then the wizard creates a cluster, where it installs Big Data Management.
Perform the following steps to create a cluster on Amazon EMR and install Big Data Management:
- 1. Upload the Big Data Management RPM package.
- 2. Prepare the bootstrap script.
- 3. Run the cluster creation wizard to create and configure the Amazon EMR cluster and execute the script.
Upload the RPM Package
The tar.gz file includes an RPM package and the binary files that you need to run the Big Data Management installation.
Upload the RPM package .tar file to a bucket on S3. Note the location so you can supply it during cluster creation steps.
Prepare the Bootstrap Script
You can use a bootstrap script to install Big Data Management on the cluster.
Use the cluster creation wizard to create an Amazon EMR cluster. The cluster creation wizard uses values in the bootstrap script to download the RPM package from the Amazon S3 bucket and extract the package. Then the wizard creates a cluster, where it installs Big Data Management.
1. Copy the following bootstrap script text to a text editor:
#!/bin/bash
echo s3 location of RPM
export S3_LOCATION_RPM=s3://<s3 bucket name>
echo Temp location to extract the RPM
export TEMP_DIR=/tmp/<TEMP-DIR-TO-EXTRACT-RPM>
echo Default location to install Informatica RPM
#make sure that INFA_RPM_INSTALL_HOME will have enough space to install the Informatica RPM
export INFA_RPM_INSTALL_HOME=/opt/
echo Extracting the prefix part from the rpm file name
echo The rpm installer name would be InformaticaHadoop-10.1.1.Linux-x64.tar.gz
export INFA_RPM_FILE_PREFIX=InformaticaHadoop-10.1.1.Linux-x64
export INFA_RPM_FOLDER=InformaticaHadoop-10.1.1-1.231
echo S3_LOCATION_RPM = $S3_LOCATION_RPM
echo TEMP_DIR = $TEMP_DIR
echo INFA_RPM_INSTALL_HOME = $INFA_RPM_INSTALL_HOME
echo INFA_RPM_FILE_PREFIX = $INFA_RPM_FILE_PREFIX
echo Installing the RPM:
echo "Creating temporary folder for rpm extraction"
sudo mkdir -p $TEMP_DIR
cd $TEMP_DIR/
echo "current directory =" $(pwd)
echo Getting RPM installer
echo Copying the rpm installer $S3_LOCATION_RPM/$INFA_RPM_FILE_PREFIX.tar.gz to $(pwd)
sudo aws s3 cp $S3_LOCATION_RPM/$INFA_RPM_FILE_PREFIX.tar.gz .
sudo tar -zxvf $INFA_RPM_FILE_PREFIX.tar.gz
cd $INFA_RPM_FOLDER
echo Installing RPM to $INFA_RPM_INSTALL_HOME
sudo rpm -ivh --replacefiles --replacepkgs InformaticaHadoop-10.1.1-1.x86_64.rpm --prefix=$INFA_RPM_INSTALL_HOME
echo Contents of $INFA_RPM_INSTALL_HOME
echo $(ls $INFA_RPM_INSTALL_HOME)
echo chmod
cd $INFA_RPM_INSTALL_HOME
sudo mkdir Informatica/blazeLogs
sudo chmod 766 -R Informatica/blazeLogs/
echo removing temporary folder
sudo rm -rf $TEMP_DIR/
echo done
2. Edit the bootstrap script to supply values for the following variables:
- <s3-bucket-name>
- Name of the Amazon S3 bucket that contains the RPM .tar file.
- <TEMP-DIR-TO-EXTRACT-RPM>
- Temporary directory location to extract the RPM package to.
- <build_number>
- RC build number.
3. Save the script file with the suffix .bash in the file name.
4. Upload the edited script file to the S3 bucket.
Run the Configuration Wizard
1. Launch the cluster configuration wizard.
2. In Step 1 of the configuration wizard, under Edit software settings (optional), select Enter configuration.
3. In the text pane, paste the following set of properties and values to configure the cluster for the Blaze run-time engine:
classification=yarn-site,properties=[yarn.scheduler.minimum-allocation- mb=256,yarn.nodemanager.resource.memory-mb=14000,yarn.nodemanager.resource.cpu- vcores=15,yarn.scheduler.maximum-allocation-mb=8192,yarn.nodemanager.vmem-check- enabled=false, yarn.nodemanager.vmem-pmem-ratio=12]
Note: The values specified in the sample above are the minimum values required. You can use greater values if your cluster requires them.
4. In Step 3, General Cluster Settings, provide the S3 location of the bootstrap script.
5. Click Create Cluster.
The cluster creation wizard uses values in the bootstrap script to download the RPM package from the Amazon S3 bucket and extract the package. Then the wizard creates a cluster, where it installs Big Data Management.
Informatica Big Data Management is installed on the cluster.