Create a Cluster and Install Big Data Management
You can run a cluster creation wizard and install Big Data Management on the created cluster.
You upload the RPM package to an S3 bucket, and prepare and upload a bootstrap script. Use the cluster creation wizard to create an Amazon EMR cluster. The cluster creation wizard uses values in the script to download the RPM package from the Amazon S3 bucket and extract the package. Then the wizard creates a cluster and installs Big Data Management.
Perform the following steps to create a cluster on Amazon EMR and install Big Data Management:
- 1. Create a bootstrap script that the cluster creation wizard uses to install Big Data Management.
- 2. Upload the Big Data Management RPM package to an S3 bucket.
- 3. Upload the EBF-9585 Hadoop installer EBF9585_HadoopEBF_EBFInstaller.tar to the same S3 bucket.
- 4. Run the cluster creation wizard to create the Amazon EMR cluster and run the script.
Create the Bootstrap Script
Create a bootstrap script that the cluster creation wizard uses to download the RPM package from the Amazon S3 bucket and extract the package.
1. Copy the following text to a text editor:
#!/bin/bash
echo s3 location of RPM
export S3_LOCATION_RPM=s3://<s3-bucket-name>
echo Temp location to extract the RPM
export TEMP_DIR=/tmp/<TEMP-DIR-TO-EXTRACT-RPM>
echo Default location to install Informatica RPM
#make sure that INFA_RPM_INSTALL_HOME will have enough space to install the Informatica RPM
export INFA_RPM_INSTALL_HOME=/opt/
echo Extracting the prefix part from the rpm file name
echo The rpm installer name would be informatica_1011HF1_hadoop_linux-x64.tar.gz
export INFA_RPM_FILE_PREFIX=informatica_1011HF1_hadoop_linux-x64
export INFA_RPM_FOLDER=InformaticaHadoop-10.1.101-1.55
export INFA_RPM_EBF_PREFIX=EBF9585_HadoopEBF_EBFInstaller
export INFA_RPM_EBF_FOLDER=EBF9585_HadoopEBF
echo S3_LOCATION_RPM = $S3_LOCATION_RPM
echo TEMP_DIR = $TEMP_DIR
echo INFA_RPM_INSTALL_HOME = $INFA_RPM_INSTALL_HOME
echo INFA_RPM_FILE_PREFIX = $INFA_RPM_FILE_PREFIX
echo INFA_RPM_EBF_PREFIX = $INFA_RPM_EBF_PREFIX
echo Installing the RPM:
echo "Creating temporary folder for rpm extraction"
sudo mkdir -p $TEMP_DIR
cd $TEMP_DIR/
echo "current directory =" $(pwd)
echo Getting RPM installer
echo Copying the rpm installer $S3_LOCATION_RPM/$INFA_RPM_FILE_PREFIX.tar.gz to $(pwd)
sudo aws s3 cp $S3_LOCATION_RPM/$INFA_RPM_FILE_PREFIX.tar.gz .
sudo tar -zxvf $INFA_RPM_FILE_PREFIX.tar.gz
cd $INFA_RPM_FOLDER
echo Installing RPM to $INFA_RPM_INSTALL_HOME
sudo rpm -ivh --replacefiles --replacepkgs InformaticaHadoop-10.1.101-1.x86_64.rpm --prefix=$INFA_RPM_INSTALL_HOME
echo Contents of $INFA_RPM_INSTALL_HOME
echo $(ls $INFA_RPM_INSTALL_HOME)
echo chmod
cd $INFA_RPM_INSTALL_HOME
sudo mkdir -p Informatica/blazeLogs
sudo chmod 766 -R Informatica/blazeLogs
echo removing temporary folder
sudo rm -rf $TEMP_DIR/
# Insert commands to perform additional tasks. For example, the section below applies an EBF.
echo Applying the EBF
echo Creating temporary folder
sudo mkdir -p $TEMP_DIR
cd $TEMP_DIR/
echo Getting EBF installer
sudo aws s3 cp $S3_LOCATION_RPM/"$INFA_RPM_EBF_PREFIX".tar .
echo Applying EBF installer to $INFA_RPM_INSTALL_HOME
sudo tar -xvf "$INFA_RPM_EBF_PREFIX".tar
cd $TEMP_DIR/$INFA_RPM_EBF_FOLDER
sudo chmod +x InformaticaHadoopEBFInstall.sh
sudo -S ./InformaticaHadoopEBFInstall.sh <<< $1$'\nYes\n1\n'
cd $INFA_RPM_INSTALL_HOME
echo Removing temporary folder
sudo rm -rf $TEMP_DIR/
echo done
2. Edit the bootstrap script to supply values for the following variables:
- <s3-bucket-name>
- Name of the Amazon S3 bucket that contains the RPM .tar file.
- <TEMP-DIR-TO-EXTRACT-RPM>
- Temporary directory location to extract the RPM package to.
- <build_number>
- RC build number.
3. Optionally add additional tasks, such as installing an EBF or patch release, at the following point in the script:
# You can insert additional tasks at this point in the script.
4. Save the script file with the suffix .bash in the file name.
5. Upload the edited script file to the S3 bucket where you want to create the cluster.
Run the Cluster Creation Wizard
Before you run the cluster creation wizard, upload the RPM package .tar file to the S3 bucket where you want to create the cluster. You supply this location during the cluster creation steps.
1. Launch the cluster creation wizard.
2. On the Edit software settings (optional) page, select Enter configuration.
3. In the text pane, paste the following set of properties and values to configure the cluster for the Blaze run-time engine:
classification=yarn-site,properties=[yarn.scheduler.minimum-allocation- mb=256,yarn.nodemanager.resource.memory-mb=14000,yarn.nodemanager.resource.cpu- vcores=15,yarn.scheduler.maximum-allocation-mb=8192,yarn.nodemanager.vmem-check- enabled=false, yarn.nodemanager.vmem-pmem-ratio=12]
Note: The values specified in the sample above are the minimum values required.
4. On the General Cluster Settings page, provide the S3 location of the bootstrap script.
5. Click Create Cluster.
The cluster creation wizard uses values in the bootstrap script to download the RPM package from the Amazon S3 bucket and extract the package. Then the wizard creates a cluster, where it installs Big Data Management.