Advanced Clusters > Advanced configurations > Google Cloud properties
  

Google Cloud properties

Create an advanced configuration to configure properties for an advanced cluster. The properties describe where you want to start the cluster on your cloud platform and the infrastructure that you want to use.
The basic properties describe the advanced configuration and define the cloud platform to host the advanced cluster. To configure the cluster, configure the platform, advanced, and runtime properties.

Basic configuration

The following table describes the basic properties:
Property
Description
Name
Name of the advanced configuration.
Description
Description of the advanced configuration.
Runtime Environment
Runtime environment to associate with the advanced configuration. The runtime environment can contain only one Secure Agent. A runtime environment cannot be associated with more than one configuration.
Cloud Platform
Cloud platform that hosts the cluster.
Select Google Cloud Platform (GCP).
Private Cluster
Creates an advanced cluster in which cluster resources have only private IP addresses.
When you choose to create a private cluster, you must specify the VPC and subnet in the advanced properties. The Secure Agent must be in the same VPC network or a VPC network that can connect to the VPC that you specify in the advanced properties.

Platform configuration

The following table describes the platform properties:
Property
Description
Region
Region in which to create the cluster. Use the drop-down menu to view the regions that you can use.
Master Instance Type
Instance type to host the master node. Use the drop-down menu to view the instance types that you can use.
Master Service Account
Service account to attach to the master node.
Worker Instance Type
Instance type to host the worker nodes. Use the drop-down menu to view the instance types that you can use.
Number of Worker Nodes
Number of worker nodes in the cluster. Specify the minimum and maximum number of worker nodes.
Worker Service Account
Service account to attach to the worker nodes.
Availability Zones
List of availability zones where cluster nodes are created. The master node is created in the first availability zone in the list. If multiple zones are specified, the cluster nodes are created across the specified zones.
The zones must be unique and be within the specified region.
Disk Size
Size of the persistent disk to attach to a worker node for temporary storage during data processing. The disk size must be between 50 GB and 16 TB.
Cluster Shutdown
Cluster shutdown method. You can select one of the following cluster shutdown methods:
  • - Smart shutdown. The Secure Agent stops the cluster when no job is expected during the defined idle timeout, based on historical data.
  • - Idle timeout. The Secure Agent stops the cluster after the amount of idle time that you define.
Mapping Task Timeout
Amount of time to wait for a mapping task to complete before it is terminated. By default, a mapping task does not have a timeout.
If you specify a timeout, a value of at least 10 minutes is recommended. The timeout begins when the mapping task is submitted to the Secure Agent.
Staging Location
Location on Google Cloud Storage for staging data.
The location name must start with gs://.
Log Location
Location on Google Cloud Storage to store logs that are generated when you run an advanced job.
The location name must start with gs://.

Advanced configuration

The following table describes the advanced properties:
Property
Description
VPC
Google Cloud Virtual Private Cloud (VPC) in which to create the cluster.
If you choose to not create a private cluster, you do not need to specify a VPC. In this case, the agent creates a VPC on your Google Cloud account based on the region and the availability zones that you select.
Subnet
Subnets in which to create cluster nodes. Use a comma-separated list to specify the subnets.
Required if a VPC is specified. Each subnet must be in a different availability zone within the specified VPC.
If you do not specify a VPC, you cannot specify subnets. You must provide availability zones instead of subnets.
IP Address Range
CIDR block that specifies the IP address range that the cluster can use.
For example: 10.0.0.0/24
Initialization Script Path
Google Cloud Storage file path of the initialization script to run on each cluster node when the node is created. Use the format: <bucket name>/<folder name>. The script can reference other init scripts in the same bucket or in a subdirectory.
The script must be a bash script.
Cluster Labels
Labels to apply to cluster nodes. Each label has a key and a value. The key can be up to 63 characters long.
You can list a maximum of 55 labels. The Secure Agent also assigns default labels to cloud resources. The default labels do not contribute to the limit of 55 labels.
Labels cannot include UTF-8 characters \u241e and \u241f that correspond to record and unit separators represented by ASCII control characters 30 and 31.

Runtime configuration

The following table describes the runtime properties:
Property
Description
Encrypt Data
Indicates whether temporary data on the cluster is encrypted.
Runtime Properties
Custom properties to customize the cluster and the jobs that run on the cluster.

Validating the configuration

You can validate the information needed to create or update an advanced configuration before you save the configuration properties.
The validation process performs the following validations:
Note: The validation process doesn't validate whether cloud resources have been configured correctly, such as whether cloud roles have all the necessary permissions.

Propagating labels to cloud resources

The Secure Agent propagates labels to cloud resources based on the cluster labels that you specify in an advanced configuration.
The agent propagates labels to the following resources:
If your enterprise follows a tagging policy, make sure to manually assign labels to other cloud resources.
Note: The Secure Agent propagates labels only to cloud resources that the agent creates. For example, if you create a network and specify the network in an advanced configuration, the agent does not propagate cluster labels to the network.

Data encryption

Encryption protects the data that is used to process jobs. You can use encryption to protect data at rest, temporary data, and data in transit.
Encryption is available for the following types of data:
Data at rest
By default, Google Cloud Storage encrypts staging data and log files. For more information, refer to the Google Cloud documentation.
For information about encrypting source and target data, see the help for the appropriate connector in the Data Integration help.
Temporary data
Temporary data includes cache data and shuffle data that the Spark engine generates on cluster nodes.
To encrypt temporary data, enable encryption in the advanced configuration. If you enable encryption, temporary data is encrypted using the HMAC-SHA1 algorithm by default. To use a different algorithm, contact Informatica Global Customer Support.
Data in transit
By default, Google Cloud Storage uses the Transport Layer Security (TLS) protocol to encrypt data in transit to and from Google Cloud Storage, including staging data and log files.