Create an advanced configuration to configure properties for an advanced cluster. The properties describe where you want to start the cluster on your cloud platform and the infrastructure that you want to use.
Basic configuration
The following table describes the basic properties:
Property
Description
Name
Name of the advanced configuration.
Description
Description of the advanced configuration.
Runtime Environment
Runtime environment to associate with the advanced configuration. The runtime environment can contain only one Secure Agent. A runtime environment cannot be associated with more than one configuration.
If you don't select a runtime environment, the validation process can't validate the communication link to the Secure Agent and that the Secure Agent has the minimum runtime requirements to start a cluster.
Cloud Platform
Cloud platform that hosts the cluster.
Select Amazon Web Services (AWS).
Private Cluster
Creates an advanced cluster in which cluster resources have only private IP addresses.
When you choose to create a private cluster, you must specify the VPC and subnet in the advanced properties.
CLAIRE-powered configuration
Enable a CLAIRE-powered configuration to allow CLAIRE to configure the cluster to stay within cost boundaries and make recommendations to improve cluster performance and to reduce infrastructure costs. You can use a CLAIRE-powered configuration if CLAIRE recommendations are enabled in your organization.
The following table describes the CLAIRE-powered configuration properties:
Property
Description
Optimization Preference
Cost or performance preference that CLAIRE uses to balance infrastructure costs with cluster performance.
Target Average Cost per Hour (USD)
Target average cost per hour in USD to run the advanced cluster.
Maximum Cost per Hour (USD)
Maximum cost per hour in USD to run the advanced cluster.
If you enable a CLAIRE-powered configuration, you configure fewer platform properties.
Platform configuration
The following table describes the platform properties:
Property
Description
Region
Region in which to create the cluster. Use the drop-down menu to view the regions that you can use.
Master Instance Type
Instance type to host the master node. Use the drop-down menu to view the instance types that you can use in your region.
For information to verify that the instance type that you select from the drop-down menu is supported in the selected availability zones and your AWS account, refer to the AWS documentation.
Not applicable in a CLAIRE-powered configuration.
Master Instance Profile
Instance profile to be attached to the master node. The name must consist of alphanumeric characters with no spaces. You can also include any of the following characters: _+=,.@-
If you specify the master instance profile, you must also specify the worker instance profile.
Worker Instance Type
Instance type to host the worker nodes. Use the drop-down menu to view the instance types that you can use in your region.
For information to verify that the instance type that you select from the drop-down menu is supported in the selected availability zones and your AWS account, refer to the AWS documentation.
Not applicable in a CLAIRE-powered configuration.
Worker Instance Profile
Instance profile to be attached to the worker nodes. The name must consist of alphanumeric characters with no spaces. You can also include any of the following characters: _+=,.@-
If you specify the worker instance profile, you must also specify the master instance profile.
Number of Worker Nodes
Number of worker nodes in the cluster. Specify the minimum and maximum number of worker nodes.
Not applicable in a CLAIRE-powered configuration.
Enable Spot Instances
Indicates whether to use Spot Instances for worker nodes.
Not applicable in a CLAIRE-powered configuration.
Spot Instance Price Ratio
Maximum percentage of On-Demand Instance price to pay for Spot Instances. Specify an integer value between 1 and 100.
Required if you enable Spot Instances. If you do not enable Spot Instances, this property is ignored.
Not applicable in a CLAIRE-powered configuration.
Enable High Availability
Indicates whether the cluster is highly available. An odd number of master nodes will be created based on the number of availability zones or subnets that you provide. You must provide at least three availability zones or subnets.
For example, if you provide six availability zones, five master nodes are created with each master node in a different availability zone.
Note: When you provide multiple availability zones or subnets, worker nodes are highly available. Worker nodes are created across the availability zones or subnets regardless of whether high availability is enabled.
For more information about high availability, refer to the Kubernetes documentation.
Not applicable in a CLAIRE-powered configuration.
Availability Zones
List of AWS availability zones where cluster nodes are created. The master node is created in the first availability zone in the list. If multiple zones are specified, the cluster nodes are created across the specified zones.
If you specify availability zones, the zones must be unique and be within the specified region.
The availability zones that you can use depend on your AWS account. To check which zones are available for your account, refer to the AWS documentation.
Required if you do not specify a VPC. If you specify a VPC, you cannot provide availability zones. You must provide subnets instead of availability zones.
EBS Volume Type
Type of Amazon EBS volumes to attach to Amazon EC2 instances as local storage. You can use only EBS General Purpose SSD (gp2).
Not applicable in a CLAIRE-powered configuration.
EBS Volume Size
Size of the EBS volume to attach to a worker node for temporary storage during data processing. The volume size scales between the minimum and maximum based on job requirements. The range must be between 50 GB and 16 TB.
By default, the minimum and maximum volume sizes are 100 GB.
This configuration property does not apply to Graviton-enabled clusters, as Graviton does not support storage scaling.
Note: When the volume size scales down, the jobs that are currently running on the cluster might take longer to complete.
Not applicable in a CLAIRE-powered configuration.
Cluster Shutdown
Cluster shutdown method. You can select one of the following cluster shutdown methods:
- Smart shutdown. The Secure Agent stops the cluster when no job is expected during the defined idle timeout, based on historical data.
- Idle timeout. The Secure Agent stops the cluster after the amount of idle time that you define.
Not applicable in a CLAIRE-powered configuration.
Mapping Task Timeout
Amount of time to wait for a mapping task to complete before it is terminated. By default, a mapping task does not have a timeout.
If you specify a timeout, a value of at least 10 minutes is recommended. The timeout begins when the mapping task is submitted to the Secure Agent.
Staging Location
Location on Amazon S3 for staging data.
You can use a path that includes the folders in the bucket, such as <bucket name>/<folder name>. Specify an S3 bucket in the same region as the cluster to improve latency.
Log Location
Location on Amazon S3 to store logs that are generated when you run an advanced job.
You can use a path that includes the folders in the bucket, such as <bucket name>/<folder name>. Specify an S3 bucket in the same region as the cluster to improve latency.
Advanced configuration
The following table describes the advanced properties:
Property
Description
VPC
Amazon Virtual Private Cloud (VPC) in which to create the cluster. The VPC must be in the specified region.
If you choose to not create a private cluster, you do not need to specify a VPC. In this case, the agent creates a VPC on your AWS account based on the region and the availability zones that you select.
Note: If you plan to use the Sequence Generator transformation, you must specify a VPC and subnets.
Subnets
Subnets in which to create cluster nodes. Use a comma-separated list to specify the subnets.
Required if a VPC is specified. Each subnet must be in a different availability zone within the specified VPC.
If you do not specify a VPC, you cannot specify subnets. You must provide availability zones instead of subnets.
Note: If you plan to use the Sequence Generator transformation, you must specify a VPC and subnets.
Initialization Script Path
Amazon S3 file path of the initialization script to run on each cluster node when the node is created. Use the format: <bucket name>/<folder name>. The script can reference other init scripts in the same folder or in a subfolder.
The script must be a bash script.
ELB Security Group
Defines the inbound rules between the Kubernetes API server and clients that are external to the advanced cluster. Also defines the outbound rules between the Kubernetes API server and the cluster nodes. This security group attaches to the load balancer that the Secure Agent provisions for the advanced cluster.
When you specify a security group, VPC and subnet information are required.
Defines the inbound rules between master nodes and worker nodes in the advanced cluster, ELB security group, Secure Agent, and outbound rules to other nodes. This security group attaches to all master nodes of the cluster.
When you specify a security group, VPC and subnet information are required.
Defines the inbound and outbound rules between worker nodes in the advanced cluster and other nodes. This security group is attached to all worker nodes of the cluster.
When you specify a security group, VPC and subnet information are required.
AWS tags to apply to cluster nodes. Each tag has a key and a value. The key can be up to 127 characters long. The value can be up to 256 characters long.
You can list a maximum of 30 tags. The Secure Agent also assigns default tags to cloud resources. The default tags do not contribute to the limit of 30 tags.
Note: Issues can occur when you override default tags. Do not override the following default tags:
The key cannot start with "aws:" because AWS reserves this phrase for their use.
Tags cannot include UTF-8 characters \u241e and \u241f that correspond to record and unit separators represented by ASCII control characters 30 and 31.
Runtime configuration
The following table describes the runtime properties:
Property
Description
Encrypt Data
Indicates whether temporary data on the cluster is encrypted.
Note: Encrypting temporary data might slow down job performance.
Runtime Properties
Custom properties to customize the cluster and the jobs that run on the cluster.
Validating the configuration
You can validate the information needed to create or update an advanced configuration before you save the configuration properties.
The validation process performs the following validations:
•You have provided the necessary information on the configuration page.
•The information you provided is valid or in the correct format. For example, the runtime environment shouldn't be associated with another advanced configuration.
If you encounter errors related to context keys when you validate an advanced configuration, then add key ccs.k8s.policy.context.key to the runtime property in the advanced configuration. You can use the following value structure to add the context keys:
For more information about context keys, contact Informatica Global Customer Support.
Amazon Linux 2 images
To create a fully-managed cluster on AWS using Amazon Linux 2 (AL2) images, you need to specify an initialization script in the advanced configuration and update the domains in the outbound allowlists for your security groups.
When you configure the worker instance type for the advanced configuration, you can select a GPU-enabled instance type. Selecting a GPU-enabled instance type creates a GPU-enabled cluster. GPUs use a massive parallel architecture to accelerate concurrent processing, offering performance benefits in many cases.
You can select a worker instance type in the g4 and p3 instance families. For more information about these instance types, refer to AWS documentation.
If your organization uses an outgoing proxy server, allow traffic from the Secure Agent machine to the following domains:
•.docker.io
•.docker.com
•.nvidia.com
•.nvidia.github.io
When you create a GPU-enabled cluster, the Spark executors each use one GPU and four Spark executor cores by default. You can change the number of Spark executor cores using the Spark session property spark.executor.cores.
All mappings submitted to the cluster that can run on GPU will run on GPU. Spark tasks in the mapping that cannot run on GPU run on CPU instead. To see which Spark jobs run on GPU and which jobs run on CPU, check the Spark event log after the job completes.
Note: The output of a task that runs on GPU might be different than the output if the task ran on CPU. For example, floating-point values might be rounded differently. For more information about processing differences, refer to Spark RAPIDS documentation.
For rules and guidelines for mappings that run on GPU-enabled clusters, see the Data Integration help.
Graviton worker instance type
You can select an AWS Graviton 2 as a worker instance type to run mappings. Graviton is a CPU-based instance type that uses Advanced RISC Machines (ARM) Neoverse N1 cores to deliver computational technology.
You can select one of the following worker instance types:
•T4g
•M6g
•M6gd
•C6g
•C6gd
•C6gn
•R6g
•R6gd
For more information about these instance types, refer to AWS documentation.
Graviton guidelines and limitations
The following guidelines and limitations apply to Graviton worker instance types:
•Graviton worker instance types do not support some expression functions, for example numeric functions such as rand and special functions such as is_date.
•EBS volume size configuration on the advanced configuration page does not apply to Graviton worker instance types, as Graviton does not support storage scaling.
•You cannot use the Java transformation or Python transformation with a Graviton worker instance type.
•You cannot run mappings that contain flat files with escape characters, multiple column delimiters, multi character quotation mark, line breakers except \n and initial row skipped set to more than one.
•You cannot use a parquet source with snappy compression with a Graviton worker instance type.
•Depending on the complexity of your mapping, you may encounter some libs incompatibility error. You can confirm the root cause by checking the spark driver logs and search for java.lang.UnsatisfiedLinkError.
Spot Instances
You can configure an advanced cluster to use Spot Instances to host worker nodes.
Spot Instances are spare compute capacity that cloud providers offer at a lower price than On-Demand Instances. This can result in significant cost savings when performing internal tests and debugging in development or QA environments. The performance of On-Demand and Spot Instances of the same instance type is similar.
Note: Spot Instances are not always available, and your cloud provider can interrupt running Spot Instances to reclaim the capacity. Therefore, you shouldn't use Spot Instances on strict SLA-bound jobs.
Spot Instances are most beneficial when the frequency of interruptions is under 5%. Use the Spot Instance advisor on AWS to see a list of instances with different levels of interruptions.
The following chart shows the potential savings between On-Demand and Spot Instances. The chart also shows the differences in savings with different levels of frequency of interruptions:
In the chart, you can see that when the frequency of interruption is below 5%, Spot Instances can save you nearly 50% on the total cost compared to On-Demand Instances. However, when the frequency of interruption exceeds 20%, your savings drops to 36%.
When you use Spot Instances, you set a Spot Instance price ratio. The Spot Instance price ratio is the maximum price you will pay for Spot Instances as a percentage of the On-Demand Instance price. For example, if On-Demand Instances cost $0.68 an hour and you set the Spot Instance price ratio to 50, you will pay the current Spot Instance price as long as the price is $0.34 an hour or less.
The Secure Agent always creates a number of On-Demand worker nodes equal to the minimum number of worker nodes that you configure. When you enable Spot Instances and the cluster scales up, the agent tries to create additional worker nodes on Spot Instances up to the maximum number of worker nodes. If Spot Instances are not available or cost more than the maximum price you set, the cluster uses On-Demand Instances for the worker nodes.
For example, if you set the minimum number of worker nodes to 5 and the maximum to 8, the agent creates 5 nodes on On-Demand Instances and tries to create 3 nodes on Spot Instances. If you set the maximum number of worker nodes equal to the minimum, the cluster uses only On-Demand Instances.
If your cloud provider interrupts a Spot node that is running an advanced job, the agent uses On-Demand nodes to complete the job.
High availability
An advanced cluster can become highly available to eliminate a single point of failure when the master node goes down. If you enable high availability and one master node goes down, other master nodes will be available and jobs on the cluster can continue running.
When a cluster is highly available, watch out for job failures in the following scenarios:
•If all master nodes go down, jobs will fail.
•If too many master nodes go down, the Kubernetes API server becomes unavailable. The threshold for the number of failures is (n+1)/2 where n is the number of master nodes. For example, if the cluster has 3 master nodes and 2 master nodes go down, the Kubernetes API server becomes unavailable and jobs fail on the cluster.
Accessing a new staging location
If you plan to use a new staging location, you must first change the staging location in the advanced configuration and then change the permissions to the staging location on AWS.
If you use role-based security, you must also change the permissions to the staging location on the Secure Agent machine.
If you change the permissions before changing the staging location in the configuration, advanced jobs fail with the following error:
Error while executing mapping. ExecutionId '<execution ID>'. Cause: [Failed to start cluster for [01000D25000000000005]. Error reported while starting cluster [Cannot apply cluster operation START because the cluster is in an error state.].].
To fix the error, perform the following tasks:
1Revert the changes to the permissions for the staging location.
2Edit the advanced configuration to revert the staging location.
3Stop the cluster when you save the configuration.
4Update the staging location in the configuration, and then change the permissions to the staging location on AWS.
Propagating tags to cloud resources
The Secure Agent propagates tags to cloud resources based on the AWS tags that you specify in an advanced configuration.
The agent propagates tags to the following resources:
•Auto Scaling group
•EBS volume
•EC2 instance
•IAM role*
•Launch template
•Load balancer*
•Public key
•Security group
•Subnet
•VPC
*If the key or value of a tag contains special characters, the agent does not propagate the tag to this resource.
Note: The Secure Agent propagates tags only to cloud resources that the agent creates. If you create a VPC and subnets and specify the resources in an advanced configuration, the agent does not propagate AWS tags to the VPC and subnets.
If your enterprise follows a tagging policy, make sure to manually assign tags to the following resources:
•Internet gateway
•Network ACL
•Route table
Default tags for cloud resources
In addition to the cloud platform tags that you specify in an advanced configuration, the Secure Agent assigns several default tags to resources. The default tags assist the cluster operator, services on the cloud platform, and data governance. Do not override the default tags.
The following table describes tags that the agent also assigns to cluster nodes to report information about the cluster:
Cloud platform tag
Description
infa:ccs:hostname
The host name of the Secure Agent machine that started the cluster.
If the Secure Agent machine stops unexpectedly and the Secure Agent restarts on a different machine, the host name is the original Secure Agent machine.
infa:k8scluster:configname
Name of the advanced configuration that is used to create the cluster.
infa:k8scluster:workdir
Staging directory that the cluster uses.
Some default tags do not have a namespace and can conflict with the user-defined tags that you specify in an advanced configuration. For example, the cluster operator automatically adds the Name and KubernetesCluster tags to all resources, but the tags do not have a namespace. If you specify a user-defined tag with the same name, such as KubernetesCluster, the cluster operator overrides the user-defined tag with the default tag.
Note: Issues can occur when you override default tags. Do not override the following default tags:
For information about encrypting source and target data, see the help for the appropriate connector in the Data Integration help.
Note: If you configure an encryption-related custom property in an Amazon S3 V2 connection, the Spark engine uses the same custom property to read and write staging data.
Temporary data
Temporary data includes cache data and shuffle data that cluster nodes generate.
To encrypt temporary data, enable encryption in the advanced configuration. If you enable encryption, temporary data is encrypted using the HMAC-SHA1 algorithm by default. To use a different algorithm, contact Informatica Global Customer Support.
Data in transit
By default, data in transit to and from Amazon S3, including staging data and log files, is encrypted using the Transport Layer Security (TLS) protocol.