•Local cluster properties. Properties to change if you move the staging and log locations to the cloud. Other advanced properties for local clusters.
•Configure cloud permissions. Configure cloud permissions if you change the staging and log locations to the cloud from local storage.
•Data encryption. Learn about how encryption is used to protect data a rest, temporary data, and data in transit.
Change staging and log locations (optional)
When you run jobs on the local cluster, you can choose staging and logging directories on the Secure Agent machine's local file system or on a cloud location. By default, the local cluster uses a local file system path unless you've configured a cloud destination.
To change the staging or log location to a cloud location, complete the following tasks:
1Refer to the following table to create the location in your cloud environment:
Cloud environment
Create location
AWS
Create the following Amazon S3 locations:
- An S3 location that the cluster uses to store staging files at run time
- An S3 location that the cluster uses to store log files for the advanced jobs that run on the cluster
Microsoft Azure
Create a storage account using Azure Data Lake Storage Gen2 with locations for staging and log files. Use a hierarchical namespace.
Google Cloud
In a Google Cloud environment, create locations for staging and log files on Google Cloud Storage.
2Specify the location in the advanced configuration of the advanced cluster. For more information about the format of the staging and log locations, see Local cluster properties
Local cluster properties
Create an advanced configuration to configure properties for an advanced cluster. The properties describe where you want to start the cluster on your cloud platform and the infrastructure that you want to use.
The basic properties describe the advanced configuration. To configure the cluster, configure the platform and runtime properties.
Basic configuration
The following table describes the basic properties:
Property
Description
Name
Name of the advanced configuration.
Description
Description of the advanced configuration.
Runtime Environment
Runtime environment to associate with the advanced configuration. The runtime environment can contain only one Secure Agent. A runtime environment cannot be associated with more than one configuration.
Cloud Platform
Cloud platform that hosts the cluster.
Select Local.
Platform configuration
The following table describes the platform properties:
Property
Description
Mapping Task Timeout
Amount of time to wait for a mapping task to complete before it is terminated. By default, a mapping task does not have a timeout.
If you specify a timeout, a value of at least 10 minutes is recommended. The timeout begins when the mapping task is submitted to the Secure Agent.
Staging Location
Location of the staging data.
For staging data on a local file system, specify the location in the following format:
file://<absolute path to the Secure Agent location>
For example, to use /home/devbld/staging as the staging location, enter:
file:///home/devbld/staging
Data Integration creates the directory if it does not already exist. Note the extra '/' character from the absolute path.
For staging locations on the cloud, specify the path in one of the following formats:
- Amazon S3. s3://<bucket name>/<folder path>
- Google Cloud Storage. gs://<bucket name>/<folder path>&:<project ID>/<region>
- Microsoft Azure Data Lake Storage Gen2. abfs(s)://<file system>@<storage account>.dfs.core.windows.net/<folder path>&:<resource group>/<region>
The region is optional. For a list of valid regions, refer to your cloud provider's documentation.
The following examples show how region formats might differ on each cloud platform:
- On AWS, use us-west-2 for US West (Oregon).
- On Google Cloud, use us-west2 for Los Angeles.
- On Microsoft Azure, use westus2 for West US 2.
When the Secure Agent creates a local cluster on Oracle Cloud Infrastructure, the staging location must be on the local file system.
Log Location
Location of the logs.
For logs on the local file system, specify the location in the following format:
file://<absolute path to the Secure Agent location>
For example, to use /home/devbld/logging as the log location, enter:
file:///home/devbld/logging
Data Integration creates the directory if it does not already exist. Note the extra '/' character from the absolute path.
For log locations on the cloud, specify the path in the following formats:
- Amazon S3. s3://<bucket name>/<folder path>
- Google Cloud Storage. gs://<bucket name>/<folder path>&:<project ID>/<region>
- Microsoft Azure Data Lake Storage Gen2. abfs(s)://<file system>@<storage account>.dfs.core.windows.net/<folder path>&:<resource group>/<region>
The region is optional. For a list of valid regions, refer to your cloud provider's documentation.
The following examples show how region formats might differ on each cloud platform:
- On AWS, use us-west-2 for US West (Oregon).
- On Google Cloud, use us-west2 for Los Angeles.
- On Microsoft Azure, use westus2 for West US 2.
When the Secure Agent creates a local cluster on Oracle Cloud Infrastructure, the log location must be on the local file system.
Runtime configuration
The following table describes the runtime properties:
Property
Description
Encrypt Data
Indicates whether temporary data on the cluster is encrypted.
Runtime Properties
Custom properties to customize the cluster and the jobs that run on the cluster.
Configure cloud permissions
Local clusters have simplified cloud permissions compared to the standard cloud deployments. Follow the configuration steps that are appropriate for your cloud platform.
Note: You don't need to configure cloud permissions when the staging and log locations are on the local file system (default).
Configure permissions for AWS
In an AWS environment, configure IAM roles for the Secure Agent and cluster operator.
Complete the following steps:
1In AWS, create an IAM role named agent_role and attach it to the Amazon EC2 instance where the Secure Agent is installed. Alternatively, you can designate an existing IAM role to be the Secure Agent role.
Tip: For instructions about creating an IAM role, refer to the AWS documentation. AWS provides several ways to create an IAM role, such as using the AWS Management Console or the AWS CLI.
2In AWS, create an IAM role for the cluster operator named cluster_operator_role.
3Create the following IAM policy with the name cluster_operator_policy:
Replace <cluster-staging-dir1> and <cluster-logging-dir1> with your staging and log locations, respectively. To accommodate S3 locations that change frequently, you can use wildcard characters. For more information, refer to the AWS documentation.
4Attach the IAM policy cluster_operator_policy to the IAM role cluster_operator_role.
5Configure the trust relationship for the cluster operator role to include the Secure Agent role. Because the Secure Agent needs to assume the cluster operator role, the cluster operator role needs to trust the Secure Agent.
Edit the trust relationship of the IAM role cluster_operator_role and specify the following IAM policy:
When you create the Google VM, specify a service account that has the required roles associated with it.
Configure permissions for Microsoft Azure
In a Microsoft Azure environment, create a managed identity and a custom role.
Complete the following steps:
1Disable the firewall on the Secure Agent machine.
2In Azure, create a managed identity named agent_identity. You can use an existing system-assigned managed identity or create a user-assigned managed identity. If you create a user-assigned managed identity, disable the system-assigned managed identity.
For instructions about creating a managed identity, refer to the Microsoft Azure documentation.
3Create a custom role named agent_role with the following role definition:
4Assign the custom role agent_role to the managed identity named agent_identity.
5Assign the managed identity agent_identity to the VM where the Secure Agent is installed.
Data encryption
Encryption protects the data that is used to process jobs. You can use encryption to protect data at rest, temporary data, and data in transit.
Encryption is available for the following types of data:
Data at rest
By default, each cloud platform encrypts staging and log files. For more information, refer to the cloud provider's documentation.
For information about encrypting source and target data, see the help for the appropriate connector.
Note: If you configure an encryption-related custom property in an Amazon S3 V2 connection, the cluster uses the same custom property to read and write staging data.
Temporary data
Temporary data includes cache data and shuffle data that cluster nodes generate.
To encrypt temporary data, enable encryption in the advanced configuration. If you enable encryption, temporary data is encrypted using the HMAC-SHA1 algorithm by default. To use a different algorithm, contact Informatica Global Customer Support.
Data in transit
By default, cloud providers use the Transport Layer Security (TLS) protocol to encrypt data in transit to and from cloud storage, including staging data and log files.
Note: When encryption is enabled on Microsoft Azure, you can specify the ABFSS protocol when you configure the staging and log locations in an advanced configuration. If encryption is not enabled, you must use the ABFS protocol.