Verify System Requirements
Verify that your environment meets the minimum system requirements for the installation process, temporary disk space, port availability, databases, and application service hardware.
For more information about product requirements and supported platforms, see the Product Availability Matrix on Informatica Network:
https://network.informatica.com/community/informatica-network/product-availability-matricesVerify Temporary Disk Space and Permissions
Verify that your environment meets the minimum system requirements for the temporary disk space.
- Disk space for the temporary files
- The installer writes temporary files to the hard disk. Verify that you have 1 GB disk space on the machine to support the installation. When the installation completes, the installer deletes the temporary files and releases the disk space.
- Permissions for the temporary files
- Verify that you have read, write, and execute permissions on the /tmp directory.
For more information about product requirements and supported platforms, see the Product Availability Matrix on Informatica Network:
https://network.informatica.com/community/informatica-network/product-availability-matricesVerify the Distribution
Informatica big data products integrate with the Hadoop environment. You must integrate the domain with the Hadoop environment. The integration varies by product, as do the requirements at installation.
The following table lists the supported Hadoop distribution versions for the big data products:
Distribution | Version |
---|
Amazon EMR | 5.10 * |
Azure HDInsight | 3.6.x |
Cloudera CDH | 5.13 Deferred support for versions 5.11.x, 5.12.x. |
Hortonworks HDP | 2.6.x Deferred support for version 2.5.x. |
MapR | 6.x MEP 4.0.x * |
*Enterprise Data Catalog does not support Amazon EMR or MapR. *Enterprise Data Lake does not support MapR. |
The following table lists the installer dependencies on the Hadoop environment for each product:
Product | Installer Dependency on Hadoop Environment |
---|
Informatica domain services * | The Hadoop environment is not required at install time. Integrate the environments after installation. |
Enterprise Data Catalog | If you choose to use an external cluster, the Hadoop environment is required at install time. If you choose to use an embedded cluster, the Hadoop environment is not required at install time. |
Enterprise Data Lake | The Hadoop environment is required at install time if you want to create and enable the Data Preparation Service and the Enterprise Data Lake Service when you run the installer. You complete the environment integration after installation. |
*The Informatica domain services installation includes the following big data products: Big Data Management, Big Data Parser, Big Data Quality, and Big Data Streaming. |
In each release, Informatica adds, defers, and drops support for Hadoop distribution versions. Informatica might reinstate support for deferred versions in a future release. To see a list of the latest supported versions, see the Product Availability Matrix on the Informatica Customer Portal:
https://network.informatica.com/community/informatica-network/product-availability-matrices.
Verify Sizing Requirements
Allocate resources for installation and deployment of services based on the expected deployment type of your environment.
Before you allocate resources, you need to identify the deployment type based on your requirements for the volume of processing and the level of concurrency. Based on the deployment type, you can allocate resources for disk space, cores, and RAM. You can also choose to tune services when you run the installer.
Determine the Installation and Service Deployment Type
The following table describes the environment for the different deployment types:
Deployment Type | Environment Description |
---|
Sandbox | Used for proof of concepts or as a sandbox with minimal users. |
Basic | Used for low volume processing with low levels of concurrency. |
Standard | Used for high volume processing with low levels of concurrency. |
Advanced | Used for high volume processing with high levels of concurrency. |
Identify Sizing Requirements
The following table provides the minimum sizing requirements:
Deployment Type | Disk Space per Node | Total Cores | RAM per Node |
---|
Sandbox | 50 GB * | 16 | 32 GB |
Basic | 100 GB | 24 | 64 GB |
Standard | 100 GB | 48 | 64 GB |
Advanced | 100 GB | 96 | 128 GB |
* Enterprise Data Catalog requires 100 GB of disk space for a Sandbox deployment type. |
The sizing requirements account for the following factors:
- •Disk space required to extract the installer
- •Temporary disk space to run the installer
- •Disk space required to install the services and components
- •Disk space required for log directories
- •Requirements to run the application services
The sizing numbers do not account for operational processing and object caching requirements.
Tune During Installation
When you run the installer, you can choose to tune the services based on the deployment size. If you create a Model Repository Service, a Data Integration Service, or a Content Management Service during installation, the installer can tune the services based on the deployment type that you enter. The installer configures properties such as maximum heap size and execution pool size.
You can tune services at any time after you install the services by using the infacmd autotune command. When you run the command, you can tune properties for other services as well as the Hadoop run-time engine properties.
Review Patch Requirements
Before you install the Informatica services, verify that the machine has the required operating system patches and libraries.
The following table lists the patches and libraries required for installation:
Platform | Operating System | Operating System Patch |
---|
Linux-x64 | Red Hat Enterprise Linux 6.5 | All of the following packages, where <version> is any version of the package: - - e2fsprogs-libs-<version>.el6
- - keyutils-libs-<version>.el6
- - libselinux-<version>.el6
- - libsepol-<version>.el6
|
Linux-x64 | Red Hat Enterprise Linux 7.0 | All of the following packages, where <version> is any version of the package: - - e2fsprogs-libs-<version>.el7
- - keyutils-libs-<version>.el7
- - libselinux-<version>.el7
- - libsepol-<version>.el7
|
Linux-x64 | SUSE Linux Enterprise Server 11 | Service Pack 2 |
Linux-x64 | SUSE Linux Enterprise Server 12 | Service Pack 2 |
Verify Port Requirements
The installer sets up the ports for components in the Informatica domain, and it designates a range of dynamic ports to use for some application services.
You can specify the port numbers to use for the components and a range of dynamic port numbers to use for the application services. Or you can use the default port numbers provided by the installer. Verify that the port numbers are available on the machines where you install the Informatica domain services, Enterprise Data Catalog, or Enterprise Data Lake.
The following table describes the port requirements for installation:
Port | Description |
---|
Node port | Port number for the node created during installation. Default is 6005. |
Service Manager port | Port number used by the Service Manager on the node. The Service Manager listens for incoming connection requests on this port. Client applications use this port to communicate with the services in the domain. The Informatica command line programs use this port to communicate to the domain. This is also the port for the SQL data service JDBC/ODBC driver. Default is 6006. |
Service Manager Shutdown port | Port number that controls server shutdown for the domain Service Manager. The Service Manager listens for shutdown commands on this port. Default is 6007. |
Informatica Administrator port | Port number used by Informatica Administrator. Default is 6008. |
Informatica Administrator HTTPS port | No default port. Enter the required port number when you create the service. Setting this port to 0 disables an HTTPS connection to the Administrator tool. |
Informatica Administrator shutdown port | Port number that controls server shutdown for Informatica Administrator. Informatica Administrator listens for shutdown commands on this port. Default is 6009. |
Minimum port number | Lowest port number in the range of dynamic port numbers that can be assigned to the application service processes that run on this node. Default is 6014. |
Maximum port number | Highest port number in the range of dynamic port numbers that can be assigned to the application service processes that run on this node. Default is 6114. |
Range of dynamic ports for application services | Range of port numbers that can be dynamically assigned to application service processes as they start up. When you start an application service that uses a dynamic port, the Service Manager dynamically assigns the first available port in this range to the service process. The number of ports in the range must be at least twice the number of application service processes that run on the node. Default is 6014 to 6114. The Service Manager dynamically assigns port numbers from this range to the Model Repository Service. |
HTTPS port for Hadoop distributions | If you deploy Enterprise Data Catalog in an HTTPS-enabled Hadoop distribution, the following are the default port numbers: - - Cloudera. 7183
- - Hortonworks. 8443
- - Azure HDInsight. 8443
Required only if you install Enterprise Data Catalog. |
Static ports for application services | Static ports have dedicated port numbers assigned that do not change. When you create the application service, you can accept the default port number, or you can manually assign the port number. The following services use static port numbers: - - Catalog Service. Default is 9085 for HTTP.
- - Content Management Service. Default is 8105 for HTTP.
- - Data Integration Service. Default is 8095 for HTTP.
- - Data Preparation Service. Default is 8099 for HTTP.
- - Enterprise Data Lake Service. Default is 9045 for HTTP.
- - Informatica Cluster Service. Default is 9075 for HTTP.
- - Mass Ingestion Service. Default is 9050 for HTTP.
- - Metadata Access Service. Default is 7080 for HTTP.
|
Note: Services and nodes can fail to start if there is a port conflict.
Guidelines for Port Configuration
The installer validates the port numbers that you specify to ensure that there will be no port conflicts in the domain.
Use the following guidelines to determine the port numbers:
- •The port number you specify for the domain and for each component in the domain must be unique.
- •The port number for the domain and domain components cannot be within the range of the port numbers that you specify for the application service processes.
- •The highest number in the range of port numbers that you specify for the application service processes must be at least three numbers higher than the lowest port number. For example, if the minimum port number in the range is 6400, the maximum port number must be at least 6403.
- •The port numbers that you specify cannot be lower than 1025 or higher than 65535.
Verify the File Descriptor Limit
Verify that the operating system meets the file descriptor requirement.
Informatica service processes can use a large number of files. To prevent errors that result from the large number of files and processes, you can change system settings with the limit command if you use a C shell, or the ulimit command if you use a Bash shell.
To get a list of the operating system settings, including the file descriptor limit, run the following command:
- C Shell
- limit
- Bash Shell
- ulimit -a
Informatica service processes can use a large number of files. Set the file descriptor limit per process to 16,000 or higher. The recommended limit is 32,000 file descriptors per process.
To change system settings, run the limit or ulimit command with the pertinent flag and value. For example, to set the file descriptor limit, run the following command:
- C Shell
- limit -h filesize <value>
- Bash Shell
- ulimit -n <value>
Informatica services use a large number of user processes. Use the ulimit -u command to adjust the max user processes setting to a level that is high enough to account for all the processes required by Blaze. Depending on the number of mappings and transformations that might run concurrently, set the file descriptor limit per process to 16,000 or higher.
Run the following command to set the max user processes setting:
- C Shell
- limit -u processes <value>
- Bash Shell
- ulimit -u <value>