Create the cluster operator, Secure Agent, master, and worker roles, and create the appropriate policies for each role to perform cluster operations in the AWS environment.
To create the IAM roles, complete the following tasks:
1Create the cluster operator role.
2Create the cluster operator policy.
3Attach the cluster operator policy to the cluster operator role.
4Configure the maximum CLI/API session duration for the cluster operator role.
5Create or reuse the Secure Agent role.
6Add the AssumeRole permission to the Secure Agent role.
7Configure the trust relationship for the cluster operator role to include the Secure Agent role.
8Create user-defined master and worker roles.
9Optionally, encrypt staging data and log files at rest.
10Optionally, create role-based security policies for Amazon data sources.
11Create or reuse a cluster storage access policy for the Secure Agent role.
Note: To minimize the Secure Agent's permissions in your environment, avoid attaching the cluster operator role to the Secure Agent machine.
Create the cluster operator role
In AWS, create an IAM role for the cluster operator. Name the role cluster_operator_role.
The following image shows how the cluster operator role might appear in the AWS Management Console:
For instructions about creating an IAM role, refer to the AWS documentation. AWS provides several ways to create an IAM role, such as using the AWS Management Console or the AWS CLI.
Create the cluster operator policy
Create an IAM policy for the cluster operator role. Name the policy cluster_operator_policy. The cluster operator policy contains the permissions that the cluster operator role needs to create and manage cloud resources for an advanced cluster. The cluster operator role is sometimes known as the kubeadm role.
The following image shows how the cluster operator policy might appear in the AWS Management Console:
The JSON document below is a template for the cluster operator role policy. Permissions that are not mandatory are flagged as OPTIONAL.
Tip: Be sure to remove the 'OPTIONAL' text from any lines that you are keeping.
The actions on Amazon S3 must be specified for all staging, log, and initialization script locations that you provide in advanced configurations.
For example, if you use staging location dev/Staging/, log location dev/Logging/, and initialization script location dev/InitScript/, the policy must list the following resources for actions on Amazon S3:
If you use a different set of staging, log, and initialization script locations in another advanced configuration, you must add those locations as resources to the same policy.
To accommodate S3 locations that change frequently, you can use wildcards. For more information, refer to the AWS documentation.
Attach the cluster operator policy
In AWS, attach the IAM policy cluster_operator_policy to the IAM role cluster_operator_role.
The following image shows how the AWS Management Console might appear when you attach the cluster operator policy to the cluster operator role:
Configure the maximum CLI/API session duration for the cluster operator role
In the IAM role cluster_operator_role, set the maximum CLI/API session duration to at least 30 minutes.
When you increase the duration, the Secure Agent has longer access to cloud resources within a single session, and you can run longer jobs on an advanced cluster.
For more information, refer to the AWS documentation.
Create or reuse the Secure Agent role
The Secure Agent requires an IAM role to access certain cloud resources while a job is running. This IAM role is attached to the Amazon EC2 instance where the Secure Agent is installed.
You can either create or reuse the Secure Agent role. Name this IAM role agent_role.
Create the Secure Agent role
To create the Secure Agent role, complete the following tasks in AWS:
1Create an IAM role named agent_role.
2Attach the IAM role agent_role to the Amazon EC2 instance where the Secure Agent is installed.
Reuse the Secure Agent role
If you already created an IAM role that is attached to the Amazon EC2 instance where the Secure Agent is installed, you can designate the IAM role to be the Secure Agent role.
Add the AssumeRole permission to the Secure Agent role
The Secure Agent needs to assume the cluster operator role to gain elevated permissions to manage an advanced cluster. For the Secure Agent to assume the cluster operator role, the Secure Agent role needs to have the AssumeRole permission.
To configure the AssumeRole permission, complete the following tasks in AWS:
1Create the following IAM policy called assume_role_agent_policy:
Note: The value in the Principal element is the ARN of the Secure Agent role.
Optionally, you can configure an external ID to limit the entities that can assume the cluster operator role. Every time that the Secure Agent attempts to assume the cluster operator role, it must specify the external ID.
For example, you can configure the external ID "123" using the following policy:
Create user-defined master and worker roles to fine-tune permissions for the master and worker nodes in an advanced cluster. The nodes use the permissions to run the Spark applications in an advanced job. After you complete these tasks, you can specify the master and worker instance profiles in an advanced configuration.
To create user-defined roles, complete the following tasks:
1Create the master and worker roles.
2Create master policies.
3Create worker policies.
4Attach the policies to the master and worker roles.
5Allow the cluster operator role to assume the worker role.
6Allow the cluster operator role to assume the master role.
The master and worker roles, the instance profiles, and the cluster operator role must be defined under the same AWS account.
When the Secure Agent starts the advanced cluster, the agent uses the cluster operator role to validate whether the instance profiles exist and whether the master and worker roles have access to required cluster directories, such as staging, log, and initialization script locations. If validation fails, the cluster fails to be created.
Create the master and worker roles
In AWS, create IAM roles for the master and worker nodes. Name the roles master_role and worker_role, respectively.
When you create the master and worker roles, AWS automatically generates an instance profile for each role.
If the policy content provides access to staging, log, and initialization script locations for multiple advanced clusters, you can reuse the same instance profiles across different advanced configurations.
Create master policies
Create IAM policies for the master role. You can define each policy as an inline policy or a managed policy.
The following table describes each IAM policy:
Policy
Description
minimal_master_policy
Required. Provides the minimal access permissions for the master role.
staging_log_access_master_policy
Required. Provides access to the staging and log locations.
init_script_master_policy
Required only if you use an initialization script. Provides access to the initialization script path and the location that stores init script and cloud-init logs.
Note: You can also generate the policy content by running the generate-policies-for-userdefined-roles.sh command. For more information about the command, see generate-policies-for-userdefined-roles.sh. The command creates the output file my-userdefined-master-worker-role-policies.json.
minimal_master_policy
The IAM policy minimal_master_policy lists the minimal requirements for the user-defined master role.
You can use the following JSON document as a template for the minimal_master_policy:
The IAM policy init_script_master_policy is required by the Cluster Computing System to allow the master node to access the initialization script and init script logging directories for the cluster.
You can use the following JSON document as a template for the init_script_master_policy:
Create IAM policies for the worker role. You can define each policy as an inline policy or a managed policy.
The following table describes each IAM policy:
Policy
Description
minimal_worker_policy
Required. Provides the minimal access permissions for the worker role.
ebs_autoscaling_worker_policy
Required only if EBS volumes auto-scale.
staging_log_access_worker_policy
Required. Provides access to the staging and log locations.
init_script_worker_policy
Required only if you use an initialization script. Provides access to the initialization script path and the location that stores init script and cloud-init logs.
Note: You can also generate the policy content by running the generate-policies-for-userdefined-roles.sh command. For more information about the command, see generate-policies-for-userdefined-roles.sh. The command creates the output file my-userdefined-master-worker-role-policies.json.
minimal_worker_policy
The IAM policy minimal_worker_policy lists the minimal requirements for the user-defined worker role.
You can use the following JSON document as a template for the minimal_worker_policy:
The IAM policy staging_log_access_worker_policy is required by the Cluster Computing System to permit worker nodes to access staging and logging directories.
You can use the following JSON document as a template for the staging_log_access_worker_policy:
The IAM policy staging_log_access_worker_policy is required by the Cluster Computing System to allow worker nodes to access the initialization script and init script logging directories.
You can use the following JSON document as a template for the init_script_worker_policy:
For a quick setup, you can use default master and worker roles. In this case, the Secure Agent automatically creates the roles when the agent starts an advanced cluster.
The agent attaches policies to the roles based on the permissions that are required by Kubernetes services. If you use role-based security and jobs have direct access to Amazon data sources, the agent also identifies the policies that are attached to the Secure Agent role and passes the policies to the worker role.
To use default roles, add the following policy to the IAM role cluster_operator_role:
Encrypt staging data and log files at rest (optional)
Optionally, set up Amazon S3 default encryption for S3 buckets to automatically encrypt staging data and log files that are stored on Amazon S3.
You can set up Amazon S3 default encryption for S3 buckets using one of the following encryption options:
Server-Side Encryption with Amazon S3-Managed Keys (SSE-S3)
Use SSE-S3 to encrypt individual staging and log files or to encrypt the S3 buckets that contain the staging and log locations.
Server-Side Encryption with AWS KMS-Managed Keys (SSE-KMS)
Use SSE-KMS to encrypt individual staging and log files. If you create user-defined master and worker roles, you can also encrypt the S3 buckets that contain the staging and log locations.
For more information about the encryption options, refer to the AWS documentation.
If you use SSE-KMS and create user-defined master and worker roles, you can restrict the customer master key (CMK) IDs that the master and worker roles can access to encrypt and decrypt data.
Specify the key IDs in the policies that are attached to the master and worker roles. In each policy, edit the Resource element in the following statement that determines actions on AWS Key Management Service (KMS):
Note: If you use SSE-KMS, you must use the default AWS-managed CMK on your Amazon account. You cannot create a custom CMK.
Create role-based security policies for Amazon data sources (optional)
Role-based security uses IAM roles to access data sources. If a connector directly accesses AWS, such as Amazon S3 V2 Connector or Amazon Redshift V2 Connector, create policies to allow the Secure Agent and worker roles to have access to data sources and fine-tune their permissions in your AWS environment.
You can skip this step if you use connectors that don't have direct access to AWS. For example, JDBC V2 Connector uses a driver to query data on Amazon Aurora and does not directly access the underlying data.
1Create policies for the Secure Agent and worker roles.
2Optionally, configure cross-account access.
By default, the agent and worker roles access data sources, but you can specify an IAM role at the connection level to access the data sources instead of using the agent and worker roles.
If you use default master and worker roles, consider the following guidelines:
•If you edit the Secure Agent role, you must restart the agent to update the master and worker roles.
•The default worker role doesn't honor the permission boundaries for the Secure Agent role.
•The staging location, log location, and cluster operator role must be in the same AWS account.
Step 10.1. Create policies for the Secure Agent and worker roles
Create policies to allow the Secure Agent and worker roles to access Amazon data sources in an advanced job. Create and distribute the policies based on the worker role type.
User-defined worker role
If you create a user-defined worker role, you can provide access to the data sources in one of the following ways:
Create a new managed policy
To create a new managed policy, complete the following tasks:
1Create the policy that the connector requires. Name the policy data_source_access_policy. For information about connector requirements, see the help for the appropriate connector.
2Attach the policy data_source_access_policy to both the Secure Agent role and worker role.
Reuse the IAM policy staging_log_access_worker_policy
To reuse the IAM policy staging_log_access_worker_policy that is attached to the worker role, complete the following tasks:
1Specify the data sources in the Resource elements.
For example, the Resource element in the following statement specifies the staging and log locations:
Below "arn:aws:s3:::<cluster-logging-dir1>/*", add the data sources.
2Add the Secure Agent role to the trust relationship of the worker role.
3Add the worker role to the trust relationship of the Secure Agent role.
Default worker role
If you use the default worker role, complete the following tasks:
1Create the policy that the connector requires. Name the policy data_source_access_policy. For information about connector requirements, see the help for the appropriate connector.
2Attach the policy data_source_access_policy to the Secure Agent role. The Secure Agent will automatically pass the policy to the worker role.
If you require cross-account access to S3 buckets in multiple Amazon accounts and you use user-defined master and worker roles, set up cross-account IAM roles in AWS.
When you set up cross-account IAM roles in AWS, complete the following tasks:
•Edit the policies in the user-defined worker role to access the S3 resources in each account.
•Add a bucket policy to the S3 buckets in each account that permits the user-defined worker role to access the bucket.
Note: You cannot combine cross-account access with default master and worker roles and role-based security. If your organization requires cross-account access, consider one of the following options:
For information about how to set up cross-account IAM roles, refer to the AWS documentation.
Use credential-based security (alternative)
For a quick setup, you can reuse the AWS credentials that you configure in a data source's connection properties instead of configuring IAM roles. Cluster nodes use the connection-level credentials to access the staging and log locations only when the same S3 bucket stores the data sources, staging files, and log files.
For example, if a job uses a JDBC V2 source and an Amazon S3 V2 target, cluster nodes use the Amazon S3 V2 credentials to access the staging location for the job.
Note: The AWS credentials in the connection must be able to access the Amazon S3 staging location that the job uses, and credentials override IAM roles. If you configure AWS credentials for a connector and the same credentials cannot access both the data sources and the staging location in an advanced job, the job fails.
If you require cross-account access to S3 buckets in multiple Amazon accounts, provide credentials for each Amazon account at the connection level.
Create or reuse a log access policy for the Secure Agent role
The Secure Agent needs permissions to access the log location to upload the agent job log at the end of an advanced job.
You can either create or reuse an IAM policy for log access.
Create a log access policy
To create an IAM policy for log access, complete the following tasks in AWS:
1Create the following IAM policy named log_access_agent_policy:
Specify the log location in the Resource elements.
2Attach the IAM policy log_access_agent_policy to the IAM role agent_role.
Reuse a log access policy
If you create user-defined master and worker roles, you can reuse the policy content that is generated for the CCS and required for the worker role.
The policy content includes access to the log location that the Secure Agent needs. For more information about user-defined master and worker roles, see Create user-defined master and worker roles.
To reuse the policy, complete the following tasks:
1Edit the trust relationship of the worker role and specify the following policy to trust the IAM role agent_role: