Enterprise Data Catalog Installation Quick Start for Embedded Cluster > Step 1 Confirm Prerequisites > Prerequisites - Embedded Cluster
  

Prerequisites - Embedded Cluster

Before you install Enterprise Data Catalog on an embedded Hadoop cluster, you must verify that the system environment meets the prerequisites required to deploy Enterprise Data Catalog.
The cluster validation utility that you can run as part of the installation validates the following prerequisites to deploy Enterprise Data Catalog on an embedded cluster:

Host Node Prerequisites

Prerequisites to Deploy Enterprise Data Catalog on Multiple Nodes

Cluster Node Prerequisites

Verify that the cluster nodes meet the following requirements:
Node Type
Minimum Requirements
Master node
  • - The number of CPUs is 4.
  • - Unused memory available for use is 12 GB.
  • - Total memory available for use is 16 GB.
  • - Disk space is 60 GB.
Slave node
  • - The number of CPUs is 4.
  • - Unused memory available for use is 12 GB.
  • - Total memory available for use is 16 GB.
  • - Disk space is 60 GB.

Operating System Prerequisites

Operating System
Prerequisite
Red Hat Enterprise Linux versions 6 and 7
  • - For Red Hat Enterprise Linux version 7.0, make sure that you use Sudo version 1.8.16 or later.
  • - Install kernel-headers and kernel-devel.
  • - Install libtirpc-devel.
  • - Install openssl version v1.01 build 16 or later or v1.0.2k.
  • - Install YUM.
  • - Make sure that /etc/sysconfig/network directory exists and configure read permission for the directory.
  • - /etc/sysconfig/network includes the same entry as the entry configured for hostname -f.
  • - Install Python version 2.6.x or 2.7.x. Applicable for Red Hat Enterprise Linux version 6.
  • - Install Python version 2.7.x. Applicable for Red Hat Enterprise Linux version 7.
  • - Disable SSL certificate validation.
SUSE Linux Enterprise Server versions 11 and 12
  • - Install netcat-openbsd.
  • - Install kernel-default-devel.
  • - Make sure that /etc/HOSTNAME directory exists and configure read permission for the directory.
  • - Make sure that the /etc/HOSTNAME directory includes the same entry as the entry configured for hostname -f
  • - Install Zypper.
  • - Install the following versions of Python:
    • - 2.6.8/2.6.9/2.7.x for SUSE Linux version 11.
    • - 2.7.x for SUSE Linux version 12.
  • - For SUSE Enterprise Linux Server 11, update all the hosts to Python version 2.6.8-0.15.1.
  • - If you install Enterprise Data Catalog on SUSE Linux Enterprise Server 12, make sure that you install the following RPM Package Manager (RPMs) on all the cluster nodes:
    • - openssl-1.0.1c-2.1.3.x86_64.rpm
    • - libopenssl1_0_0-1.0.1c-2.1.3.x86_64.rpm
    • - libopenssl1_0_0-32bit-1.0.1c-2.1.3.x86_64.rpm
    • - python-devel-2.6.8-0.15.1.x86_64
  • - Do not install libsnappy if you install Enterprise Data Catalog on SUSE Linux Enterprise Server.
Common prerequisites for both operating systems
  • - Ensure that the operating system is 64-bit.
  • - Make sure that you install file.
  • - Ensure that Bash is the default shell.
  • - If you use a user account without root privileges and if you want to remove sudo access, ensure that defaults requiretty is commented in /etc/sudoers.
  • - If you use a user account without root privileges, make sure that the Hadoop user has sudo privileges.
  • - Make sure that you disable the password prompt for the Hadoop user.
  • - Make sure that you install python-devel.
  • - Make sure that you set UMASK to 022 (0022) or 027 (0027).
  • - Ensure that you set Fully Qualified Domain Name (FQDN) for hostname -f.
  • - Make sure that the /var location does not have write privilege for everyone.
  • - Ensure that you configure the Linux base repositories.
  • - Make sure that you install the netstat command line network utility tool.
  • - Verify that the root directory (/) has a minimum of 10 GB of free disk space.
  • - If you plan to install Enterprise Data Catalog on an embedded cluster and you want to mount Informatica Cluster Service on a separate mount location, verify that the mount location has a minimum of 50 GB of free disk space.
  • - Make sure that the NOEXEC flag is not set for the file system mounted on the/tmp directory.
  • - You might want to ensure that the /tmp directory has a minimum of 20 GB of free disk space to enhance performance.
  • - Make sure that you install the scp, curl, unzip, wget and tar utilities.
  • - Ensure that you configure the home directory with write permission.
  • - Make sure that the configured /etc/hosts file of all machines include the FQDN for the machines.
  • - Ensure that the Network Time Protocol (NTP) daemon is synchronized and running.
  • - Make sure that the /tmp directory has chmod 777 permission configured for the directory.
  • - Make sure that the / and /var directories do not have chmod 777 permission configured.
  • - Make sure that the /var directory has a minimum of 2 GB of free disk space.
  • - Make sure that the /usr directory has a minimum of 2 GB of free disk space.
  • - Ensure that you disable Selinux or set Selinux to permissive mode.
  • - Make sure that the /etc/hosts has an entry for the loopback address, 127.0.0.1 localhost localhost.domain.com
  • - Make sure that you set the core limit to unlimited for a user without root privileges.
  • - If you configure the workingDir to /, validate if the file system mounted on /tmp and /var directories have the EXEC flag set.
  • - If the workingDir is not configured to /, validate if the workingDir directory has Read, Write, and Execute permissions configured. Validate if the EXEC flag is also set for the directory.
  • - Verify that you have the write permission on the /home directory. You can configure the permission in the /etc/default/useradd file.
  • - Make sure that each machine in the cluster includes the 127.0.0.1 localhost localhost.localdomain entry in the /etc/hosts file.
  • - Verify that the /etc/hosts file includes the fully-qualified host names for all the cluster nodes. Alternatively, make sure that reverse DNS lookup returns the fully-qualified host names for all the cluster nodes.
  • - Verify that the Linux repository includes postgresql version 8.14.18, release 1.el6_4 or later versions.
  • - Ensure that you set the soft limit for max user processes to 32000 or more.
  • - Ensure that you set the hard limit for max user processes to 32000 or more.
  • - On each host machine, verify that you have the following tools and applications available:
    • - YUM and RPM (RHEL/CentOS/Oracle Linux)
    • - Zypper
    • - scp, curl, unzip, tar, and wget
    • - awk
    • - OpenSSL version 1.0.1e-30.el6_6.5.x86_64 or later.
    • Note: Make sure that the $PATH variable points to the /usr/bin directory to use the correct version of Linux OpenSSL.
  • - For Enterprise Data Catalog installed on an embedded cluster, if you have not configured the Linux base repository or if you do not have an Internet connection, install the following packages:
  • - The following RPMs on the Ambari Server host:
    • - postgresql-libs
    • - postgresql-server
    • - postgresql
    -The following RPMs on all cluster nodes:
    • - nc
    • - redhat-lsb
    • - psmisc
    • - python-devel

Apache Ambari Prerequisites

Apache Ranger Prerequisites

Before you deploy Enterprise Data Catalog on clusters where Apache Ranger is enabled, make sure that you configure the following permissions for the Informatica domain user:

File Descriptor Limit

Verify that the maximum number of open file descriptors is 10,000 or more. Use the ulimit command to verify the current value and change the value if required.

SSL Prerequisites

If you want to enable SSL protocol for the cluster, verify the following prerequisites:

Kerberos Prerequisites

If you want to enable Kerberos authentication for the cluster, verify the following prerequisites: