Advanced Clusters > Setting up AWS > Step 1. Complete prerequisites
  

Step 1. Complete prerequisites

Before you set up your environment, verify the requirements for your environment and your cloud platform.
Complete the following tasks:

Verify privileges in your organization

Verify that you are assigned the correct privileges for advanced configurations in your organization.
Privileges for advanced configurations provide you varying access levels to the Advanced Clusters page in Administrator as well as Monitor.
You must have at least the read privilege to view the advanced configurations and to monitor the advanced clusters.

Verify AWS subscriptions

Verify that you have the necessary AWS subscriptions to create an advanced cluster in an AWS environment.
You must have the following services on AWS:
Amazon Elastic Block Service (Amazon EBS)
Amazon EBS volumes are attached to Amazon EC2 instances as local storage. The local storage is used to store information that the Serverless Spark engine needs to run advanced jobs. For example, local storage is used to store the content of the Spark image. The Spark engine also requires local storage to process data logic and to persist data during processing.
Amazon Elastic Compute Cloud (Amazon EC2)
Amazon EC2 instances are launched to host an advanced cluster. One Amazon EC2 instance hosts the master node, and additional instances host the worker nodes.
Amazon EC2 Auto Scaling
Amazon EC2 Auto Scaling automatically adds or removes cluster nodes in the advanced cluster based on job-processing requirements.
Amazon Elastic Load Balancing (Amazon ELB)
A load balancer accepts incoming advanced jobs from a Secure Agent and provides an entry point for the jobs to an advanced cluster.
Amazon Identity and Access Management (IAM)
AWS IAM provides access control that you can use to specify which services and resources an advanced cluster can access in your AWS environment.
Amazon Route 53
Nodes in an advanced cluster communicate information with other nodes in the same cluster using Route 53.
Amazon Simple Storage Service (Amazon S3)
An advanced cluster is staged in Amazon S3 buckets. Amazon S3 is also used to store logs that are generated for advanced jobs.

Learn about roles and policies in the AWS environment

The Secure Agent and the advanced cluster use IAM roles and the IAM policies that you attach to those roles to access and process data in an AWS environment. For example, the agent and the cluster use the roles to manage cloud resources such as EC2 instances and to access data on S3 like staging, log, and initialization script files.

Roles

An AWS environment uses the following IAM roles:
Cluster operator role
The cluster operator role is an IAM role that has elevated permissions to manage the cloud resources that host an advanced cluster.
Secure Agent role
The Secure Agent role is an IAM role for the Secure Agent. This IAM role is attached to the Secure Agent machine which is the Amazon EC2 instance where the Secure Agent runs.
The Secure Agent uses the Secure Agent role to assume the cluster operator role to manage an advanced cluster. The Secure Agent also uses the Secure Agent role to process jobs and access some resources on the cloud.
Master role
The master role is an IAM role that defines the permissions for the master nodes in an advanced cluster.
Worker role
The worker role is an IAM role that defines the permissions for the worker nodes in an advanced cluster.
For more information about the roles, see Step 7. Create IAM roles.

Policies

Each IAM role uses one or more IAM policies.
The following table describes the policies and the roles that use each policy:
Policy
Used by role
Description
cluster_operator_policy
Cluster operator role
Required. Provides the minimal access permissions to create and manage cloud resources for an advanced cluster.
assume_role_agent_policy
Secure Agent role
Required. Allows the Secure Agent to use the Secure Agent role to assume the cluster operator role.
data_source_access_policy
Secure Agent role
Worker role
Required if you use role-based security for Amazon data sources and want to create a unique policy. Provides access to the Amazon data sources in an advanced job.
log_access_agent_policy
Secure Agent role
Required if you do not configure a trust relationship between the Secure Agent role and worker role. Provides access to the log location to upload the agent job log at the end of an advanced job.
minimal_master_policy
Master role
Required. Provides the minimal access permissions for the master role.
staging_log_access_master_policy
Master role
Required. Provides access to the staging and log locations.
init_script_master_policy
Master role
Required only if you use an initialization script. Provides access to the initialization script path and the location that stores init script and cloud-init logs.
minimal_worker_policy
Worker role
Required. Provides the minimal access permissions for the worker role.
ebs_autoscaling_worker_policy
Worker role
Required only if EBS volumes auto-scale. Provides permissions to auto-scale the EBS volumes.
staging_log_access_worker_policy
Worker role
Required. Provides access to the staging and log locations.
init_script_worker_policy
Worker role
Required only if you use an initialization script. Provides access to the initialization script path and the location that stores init script and cloud-init logs.

Learn about resource access

To process data, the Secure Agent and the advanced cluster access the resources that are part of an advanced job, including resources on the cloud platform, source and target data, and staging and log locations.
Resources are accessed to perform the following tasks:

Designing a mapping

When you design a mapping, the Secure Agent accesses sources and targets so that you can read and write data.
For example, when you add a Source transformation to a mapping, the Secure Agent accesses the source to display the fields that you can use in the rest of the mapping. The Secure Agent also accesses the source when you preview data.
The Secure Agent accesses sources and targets based on the type of connectors that the job uses:
Connectors with direct access to Amazon data sources
If the mapping uses a connector with direct access to Amazon data sources, the Secure Agent uses role-based security or credential-based security to access the source or target. For role-based security, the Secure Agent uses the Secure Agent role to access data sources. If you specify an IAM role at the connection level, the agent assumes the connection-level role to access the data sources at run time. For credential-based security, the Secure Agent accesses the source or target through connection-level AWS credentials.
Connectors without direct access to Amazon data sources
If the mapping does not use a connector with direct access to Amazon data sources, the Secure Agent uses the connection properties to access the source or target. For example, the Secure Agent might use the user name and password that you provide in the connection properties to access a database.

Creating an advanced cluster

To create an advanced cluster, the Secure Agent uses the cluster operator role to store cluster details in the staging location and to create the cluster. The master and worker nodes use the master and worker roles to access cloud resources.
The following image shows the process that the Secure Agent uses to create a cluster:
The following steps describe the process that the Secure Agent uses to create a cluster:
  1. 1You run a job.
  2. 2The Secure Agent assumes the cluster operator role to gain elevated privileges on AWS. The cluster operator role allows the Secure Agent to assume the master and worker roles.
  3. 3If you create a user-defined worker role, the Secure Agent uses the worker role and verifies that the cluster can access staging and log locations.
  4. 4The Secure Agent uses the cluster operator role to store cluster details in the staging location.
  5. 5The Secure Agent uses the cluster operator role to create the cluster.
  6. 6The Secure Agent uses the cluster operator role to create cluster resources for the master node.
  7. 7The master node uses the master role to access cloud resources on services on AWS like Amazon EC2, AWS Auto Scaling, and Elastic Load Balancing to manage node elasticity and resource optimization.
  8. 8The master node uses the master role to access the initialization script.
  9. 9The Secure Agent uses the cluster operator role to create cluster resources for the worker nodes and creates an Auto Scaling group with the minimum number of worker nodes.
  10. 10The worker nodes use the worker role to access cloud resources on services on AWS like Amazon EC2 and AWS Networking to access compute and networking capabilities.
  11. 11The worker nodes use the worker role to access the initialization script.
For more information about how the cluster operator role, the master role, and the worker role access cloud resources in an advanced cluster, see IAM policy reference.

Running a job with direct access to Amazon data sources

To run a job that uses a connector with direct access to Amazon data sources, the cluster accesses Amazon resources using role-based security or credential-based security.
The following image shows the process that the Secure Agent and cluster nodes use to run the job:
The following steps describe the process that the Secure Agent and cluster nodes use to run the job:
  1. 1The Secure Agent assumes the cluster operator role to store job dependencies in the staging location.
  2. 2The worker nodes use the connection-level role, the worker role, or connection-level AWS credentials to access source data based on the job security type. If you use role-based security, the worker nodes use the connection-level role or the worker role. If you use credential-based security, the worker nodes use the connection-level credentials. The authentication configured at the connection level takes precedence.
  3. 3The worker nodes use the connection-level role, worker role, or connection-level credentials to access the staging location to get job dependencies and stage temporary data.
  4. 4The worker nodes use the worker role to auto-scale EBS volumes if the job requires more storage space.
  5. 5The master node uses the master role to scale cluster nodes based on resource requirements.
  6. 6The worker nodes use the worker role to store logs in the log location.
  7. 7The master node uses the master role to store logs in the log location.
  8. 8The Secure Agent uses the Secure Agent role to upload the agent job log to the log location.

Security types

Worker nodes access Amazon resources in the following ways based on the security type:
Credential-based security
If you set up credential-based security, worker nodes use connection-level AWS credentials to access Amazon resources, including Amazon data sources and the staging location. The worker nodes use the worker role to access the log location.
Credential-based security overrides role-based security. If any source or target in the job provides AWS credentials, the worker nodes reuse the credentials to access the staging location. For example, if a job uses a JDBC V2 source and an Amazon S3 V2 target, the worker nodes use the AWS credentials that access the S3 target to access the staging location for the job.
Role-based security
If you set up role-based security, worker nodes use either the connection-level role or the worker role to access Amazon resources, including Amazon data sources, the staging location, and the log location. The role configured at the connection level takes precedence over the worker role.
Note: If you use default master and worker roles, the policies that are attached to the Secure Agent role are passed to the worker role. The policies that are passed to the worker role can grant the worker role access to Amazon resources.

Running a job without direct access to Amazon data sources

To run a job that doesn't use a connector with direct access to Amazon data sources, the cluster accesses Amazon resources using the connection properties and the worker role.
For example, JDBC V2 Connector doesn't have direct access to Amazon data sources. To run a job that uses JDBC V2 Connector, the cluster uses the connection properties to read and temporarily stage the data before processing and writing the data to the target.
The following image shows the process that the Secure Agent and cluster nodes use to run the job:
The following steps describe the process that the Secure Agent and cluster nodes use to run the job:
  1. 1The Secure Agent assumes the cluster operator role to store job dependencies in the staging location.
  2. 2The worker nodes use connection properties to access source data.
  3. 3The worker nodes use the worker role to access the staging location to get job dependencies and stage temporary data.
  4. 4The worker nodes use the worker role to auto-scale EBS volumes if the job requires more storage space.
  5. 5The master node uses the master role to scale cluster nodes based on resource requirements.
  6. 6The worker nodes use the worker role to store logs in the log location.
  7. 7The master node uses the master role to store logs in the log location.
  8. 8The Secure Agent uses the Secure Agent role to upload the agent job log to the log location.
Note: If any connector in the job uses AWS credentials to directly access a source or target, the connection-level AWS credentials override the worker role to gain access to the staging location.

Polling logs

When you use Monitor, the Secure Agent accesses the log location to poll logs.
The Secure Agent polls logs based on the type of connectors that the job uses:
Connectors with direct access to Amazon data sources
If the job uses a connector with direct access to Amazon data sources, the Secure Agent uses either credential-based security or role-based security to access the log location. For credential-based security, the Secure Agent polls logs through the connection-level AWS credentials. For role-based security, the Secure Agent polls logs through the permissions in the Secure Agent role.
Connectors without direct access to Amazon data sources
If the job does not use a connector with direct access to Amazon data sources, the Secure Agent polls logs through the permissions in the Secure Agent role.

Learn about the AWS cluster

When you create an advanced cluster in an AWS environment, the cluster uses an OS image that Informatica manages and publishes.
The OS image includes certain prebuilt packages and the following additional yum packages:
device-mapper-persistent-data
docker-ce
gnupg2
gzip
kernel-devel
kernel-headers
kubeadm
kubelet
lvm2
tar
unzip
yum-utils
The OS image also includes the following docker images:
calico/kube-controllers
calico/node
calico/cni
calico/pod2daemon-flexvol
coreos/flannel
coreos/flannel-cni
imega/jq
kube-scheduler