Installation and Upgrade Guide > Part II: Install and Configure for Amazon EMR > Installation for Amazon EMR > Create a Cluster and Install Big Data Management
  

Create a Cluster and Install Big Data Management

You can run a cluster creation wizard and install Big Data Management on the created cluster.
You upload the RPM package to an S3 bucket, and prepare and upload a bootstrap script. Use the cluster creation wizard to create an Amazon EMR cluster. The cluster creation wizard uses values in the script to download the RPM package from the Amazon S3 bucket and extract the package. Then the wizard creates a cluster and installs Big Data Management.
Perform the following steps to create a cluster on Amazon EMR and install Big Data Management:
  1. 1. Create a bootstrap script that the cluster creation wizard uses to install Big Data Management.
  2. 2. Upload the Big Data Management RPM package to an S3 bucket.
  3. 3. Upload the EBF-9585 Hadoop installer EBF9585_HadoopEBF_EBFInstaller.tar to the same S3 bucket.
  4. 4. Run the cluster creation wizard to create the Amazon EMR cluster and run the script.

Create the Bootstrap Script

Create a bootstrap script that the cluster creation wizard uses to download the RPM package from the Amazon S3 bucket and extract the package.
    1. Copy the following text to a text editor:
    #!/bin/bash

    echo s3 location of RPM
    export S3_LOCATION_RPM=s3://<s3-bucket-name>

    echo Temp location to extract the RPM
    export TEMP_DIR=/tmp/<TEMP-DIR-TO-EXTRACT-RPM>

    echo Default location to install Informatica RPM
    #make sure that INFA_RPM_INSTALL_HOME will have enough space to install the Informatica RPM
    export INFA_RPM_INSTALL_HOME=/opt/

    echo Extracting the prefix part from the rpm file name
    echo The rpm installer name would be informatica_1011HF1_hadoop_linux-x64.tar.gz
    export INFA_RPM_FILE_PREFIX=informatica_1011HF1_hadoop_linux-x64
    export INFA_RPM_FOLDER=InformaticaHadoop-10.1.101-1.55
    export INFA_RPM_EBF_PREFIX=EBF9585_HadoopEBF_EBFInstaller
    export INFA_RPM_EBF_FOLDER=EBF9585_HadoopEBF

    echo S3_LOCATION_RPM = $S3_LOCATION_RPM
    echo TEMP_DIR = $TEMP_DIR
    echo INFA_RPM_INSTALL_HOME = $INFA_RPM_INSTALL_HOME
    echo INFA_RPM_FILE_PREFIX = $INFA_RPM_FILE_PREFIX
    echo INFA_RPM_EBF_PREFIX = $INFA_RPM_EBF_PREFIX

    echo Installing the RPM:
    echo "Creating temporary folder for rpm extraction"
    sudo mkdir -p $TEMP_DIR
    cd $TEMP_DIR/
    echo "current directory =" $(pwd)

    echo Getting RPM installer
    echo Copying the rpm installer $S3_LOCATION_RPM/$INFA_RPM_FILE_PREFIX.tar.gz to $(pwd)
    sudo aws s3 cp $S3_LOCATION_RPM/$INFA_RPM_FILE_PREFIX.tar.gz .
    sudo tar -zxvf $INFA_RPM_FILE_PREFIX.tar.gz
    cd $INFA_RPM_FOLDER
    echo Installing RPM to $INFA_RPM_INSTALL_HOME

    sudo rpm -ivh --replacefiles --replacepkgs InformaticaHadoop-10.1.101-1.x86_64.rpm --prefix=$INFA_RPM_INSTALL_HOME

    echo Contents of $INFA_RPM_INSTALL_HOME
    echo $(ls $INFA_RPM_INSTALL_HOME)

    echo chmod
    cd $INFA_RPM_INSTALL_HOME
    sudo mkdir -p Informatica/blazeLogs
    sudo chmod 766 -R Informatica/blazeLogs
    echo removing temporary folder
    sudo rm -rf $TEMP_DIR/

    # Insert commands to perform additional tasks. For example, the section below applies an EBF.
    echo Applying the EBF
    echo Creating temporary folder
    sudo mkdir -p $TEMP_DIR
    cd $TEMP_DIR/
    echo Getting EBF installer
    sudo aws s3 cp $S3_LOCATION_RPM/"$INFA_RPM_EBF_PREFIX".tar .
    echo Applying EBF installer to $INFA_RPM_INSTALL_HOME
    sudo tar -xvf "$INFA_RPM_EBF_PREFIX".tar
    cd $TEMP_DIR/$INFA_RPM_EBF_FOLDER
    sudo chmod +x InformaticaHadoopEBFInstall.sh
    sudo -S ./InformaticaHadoopEBFInstall.sh <<< $1$'\nYes\n1\n'

    cd $INFA_RPM_INSTALL_HOME
    echo Removing temporary folder
    sudo rm -rf $TEMP_DIR/

    echo done
    2. Edit the bootstrap script to supply values for the following variables:
    <s3-bucket-name>
    Name of the Amazon S3 bucket that contains the RPM .tar file.
    <TEMP-DIR-TO-EXTRACT-RPM>
    Temporary directory location to extract the RPM package to.
    <build_number>
    RC build number.
    3. Optionally add additional tasks, such as installing an EBF or patch release, at the following point in the script:
    # You can insert additional tasks at this point in the script.
    4. Save the script file with the suffix .bash in the file name.
    5. Upload the edited script file to the S3 bucket where you want to create the cluster.

Run the Cluster Creation Wizard

Before you run the cluster creation wizard, upload the RPM package .tar file to the S3 bucket where you want to create the cluster. You supply this location during the cluster creation steps.
    1. Launch the cluster creation wizard.
    2. On the Edit software settings (optional) page, select Enter configuration.
    3. In the text pane, paste the following set of properties and values to configure the cluster for the Blaze run-time engine:
    classification=yarn-site,properties=[yarn.scheduler.minimum-allocation- mb=256,yarn.nodemanager.resource.memory-mb=14000,yarn.nodemanager.resource.cpu- vcores=15,yarn.scheduler.maximum-allocation-mb=8192,yarn.nodemanager.vmem-check- enabled=false, yarn.nodemanager.vmem-pmem-ratio=12]
    Note: The values specified in the sample above are the minimum values required.
    4. On the General Cluster Settings page, provide the S3 location of the bootstrap script.
    5. Click Create Cluster.
The cluster creation wizard uses values in the bootstrap script to download the RPM package from the Amazon S3 bucket and extract the package. Then the wizard creates a cluster, where it installs Big Data Management.