Blaze Engine Configuration
You can use the Blaze runtime engine to run mappings in the Hadoop environment.
Perform the following configuration tasks in the Big Data Management installation:
- 1. Configure Blaze on Kerberos-enabled clusters.
- 2. Configure Blaze engine log directories.
- 3. Reset system settings to allow more processes and files.
- 4. Perform administration tasks.
- 5. Allocate cluster resources for Blaze.
Depending on the Hadoop environment, you perform additional steps in the Hadoop cluster to allow Big Data Management to use the Blaze engine to run mappings. See "
Configuring Big Data Management to Run Mappings in Hadoop Environments."
Configure Blaze Engine Log and Work Directories
The hadoopEnv.properties file lists the log and work directories that the Blaze engine uses on the node and on HDFS. You must grant write permission on these directories for the user account that starts the Blaze engine.
Grant write permission for these directories on the user account that starts the Blaze engine in the following cluster properties:
- •infagrid.node.local.root.log.dir
- •infacal.hadoop.logs.directory
For more information about user accounts for the Blaze engine, see the Informatica Big Data Management Security Guide.
Reset System Settings to Allow More Processes and Files
Informatica service processes can use a large number of files. If you want to use Blaze to run mappings on the Hadoop cluster, and prevent errors that result from the large number of files and processes, increase operating system settings on the machine that hosts the Data Integration Service. When you increase settings, you allow more user processes and files.
You can change system settings with the limit command if you use a C shell, or the ulimit command if you use a Bash shell.
1. Review the present operating system settings.
Run the following command:
- C Shell
- limit
- Bash Shell
- ulimit -a
2. Optionally reset the file descriptor limit.
Informatica service processes can use a large number of files. Set the file descriptor limit per process to 16,000 or higher. The recommended limit is 32,000 file descriptors per process.
To change system settings, run the limit or ulimit command with the pertinent flag and value. For example, to set the file descriptor limit, run the following command:
- C Shell
- limit -h filesize <value>
- Bash Shell
- ulimit -n <value>
3. Optionally adjust the max user processes.
Informatica services use a large number of user processes. Use the ulimit -u command to adjust the max user processes setting to a level that is high enough to account for all the processes required by Blaze. Depending on the number of mappings and transformations that might run concurrently, adjust the setting from the default value of 1024 to at least 4096.
Run the following command to set the max user processes setting:
- C Shell
- limit -u processes <value>
- Bash Shell
- ulimit -u <value>
Open the Required Ports for the Blaze Engine
When you create the Hadoop connection, specify the minimum and maximum port range that the Blaze engine can use. Then open the ports on the cluster for the Blaze engine to use to communicate with the Informatica domain.
Note: If the Hadoop cluster is behind a firewall, work with your network administrator to open the range of ports that the Blaze engine uses.
Blaze Engine Console
You can run mappings using the native, Blaze, or Spark runtime engines.
The Blaze engine console is enabled by default.
If you choose never to use Blaze to run mappings, you must disable the Blaze Engine Console.
Disable the Blaze Engine Console
1. Browse to the following location: <InformaticaInstallationDir>/services/shared/hadoop/<Hadoop_distribution_name>_<version_number>/infaConf
2. Find the file named hadoopEnv.properties.
3. Back up the file before you modify it, then open the file for editing.
4. Locate the property infagrid.blaze.console.enabled.
5. If necessary, remove the # (hash) character to uncomment the line, and then change the value of the infagrid.blaze.console.enabled property to FALSE.
6. Save and close the hadoopEnv.properties file.
Grant Permission on the Source Database
When you use the Blaze engine to run mappings that read from a Hive source, certain conditions require the Blaze impersonation user to have CREATE TABLE privileges on the Hive database.
When a mapping reads from a Hive source, and one of the following conditions is true:
- •The Hive source table uses SQL standards-based authorization.
- •When the mapping contains a Lookup transformation where an SQL override is configured.
In either case, the Blaze engine stages query results in a temporary table, and the Blaze impersonation user requires CREATE TABLE permissions on the source database.
Allocate Cluster Resources for Blaze
When you use Blaze to run mappings, verify that the cluster allocates sufficient memory and resources to management and runtime services.
Allocate the following types of resource for each container on the cluster:
- Memory
- Random Access Memory (RAM) available for each container. This setting is also known as the container size. You can set the minimum and maximum memory per container.
- On each of the data nodes on the cluster:
- - Set the minimum container memory to allow the VM to spawn sufficient containers.
- - Set maximum memory on the cluster to increase resource memory available to Blaze services.
- Vcore
- A vcore is a virtual core. The number of virtual cores per container may correspond to the number of physical cores on the cluster, but you can increase the number to allow for more processing. You can set the minimum and maximum number of vcores per container.
The following table contains resource allocation guidelines:
Node Type | Resources Required Per Container |
---|
Runtime node -- runs mappings only | - - Minimum memory: Set to no less than 4 GB less than the maximum memory.
- - At least 10 GB maximum memory
- - 6 vcores
|
Management node -- a single node that runs mappings and management services | - - Minimum memory: Set to no less than 4 GB less than the maximum memory.
- - At least 13 GB maximum memory
- - 9 vcores
|
Set the resources in the configuration console for the cluster, or edit the file yarn-site.xml.
To edit resource settings in yarn-site.xml:
- 1. Use yarn.nodemanager.resource.memory-mb to set the maximum memory setting.
- 2. Use yarn.scheduler.minimum-allocation-mb to set the minimum memory setting.
- 3. Use yarn.nodemanager.resource.cpu-vcores to set the number of vcores.
Configure Virtual Memory Limits
Configure the virtual memory limits in yarn-site.xml for every node in the Hadoop cluster. After you configure virtual memory limits you must restart the Hadoop cluster.
yarn-site.xml is located in the following directory on every node in the Hadoop cluster:
/etc/hadoop/conf/yarn-site.xml
In yarn-site.xml, configure the following property:
- yarn.nodemanager.vmem-check-enabled
- Determines virtual memory limits.
The following example describes the property you can configure in yarn-site.xml:
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
<description>Enforces virtual memory limits for containers.</description>
</property>