Advanced Clusters > Setting up Google Cloud > Step 1. Complete prerequisites
  

Step 1. Complete prerequisites

Before you set up your environment, verify the requirements for your environment and your cloud platform.
Complete the following tasks:

Verify privileges in your organization

Verify that you are assigned the correct privileges for advanced configurations in your organization.
Privileges for advanced configurations provide you varying access levels to the Advanced Clusters page in Administrator as well as Monitor.
You must have at least the read privilege to view the advanced configurations and to monitor the advanced clusters.

Verify Google Cloud services

Verify that you have the necessary services to create an advanced cluster on Google Cloud.
You must have the following services on your Google account:
Google Cloud Storage
Staging data and log files for an advanced cluster and advanced jobs are stored on Google Cloud Storage.
Google Compute Engine
A virtual machine hosts the Secure Agent.
VPC Network
A VPC network and subnet to host advanced cluster.
Network Service
A network service to provide load-balancing and Cloud NAT.

Learn about resource access

To process data, the Secure Agent and the advanced cluster access the resources that are part of an advanced job, including resources on the cloud platform, source and target data, and staging and log locations.
Resources are accessed to perform the following tasks:

Designing a mapping

When you design a mapping, the Secure Agent accesses sources and targets so that you can read and write data.
For example, when you add a Source transformation to a mapping, the Secure Agent accesses the source to display the fields that you can use in the rest of the mapping. The Secure Agent also accesses the source when you preview data.
To access a source or target, the Secure Agent uses the permissions in the Secure Agent service account.

Creating an advanced cluster

To create an advanced cluster, the Secure Agent uses the Secure Agent role to store cluster details in the staging location and to create the cluster. The master and worker nodes use either the master and worker roles or the Secure Agent role to access cloud resources.
The following image shows the process that the Secure Agent uses to create a cluster:
The following steps describe the process that the Secure Agent uses to create a cluster:
  1. 1You run a job.
  2. 2The Secure Agent uses the Secure Agent role to store cluster details in the staging location.
  3. 3The Secure Agent uses the Secure Agent role to create the cluster.
  4. 4If you create master and worker roles and service accounts, the Secure Agent attaches the service accounts to the cluster nodes.
  5. 5The Secure Agent uses the Secure Agent role to create cluster resources for the master node.
  6. 6The master node uses the master role to access cloud resources on services on Google Cloud like Google Compute Engine to manage node elasticity and resource optimization.
  7. 7The master node uses the master role to access the initialization script. If you didn't create master and worker roles and service accounts, the master node uses the Secure Agent role.
  8. 8The Secure Agent uses the Secure Agent role to create cluster resources for the worker nodes and creates a managed instance group with the minimum number of worker nodes.
  9. 9The worker nodes use the worker role to access cloud resources on services on Google Cloud like Google Compute Engine and Google Cloud Networking to access compute and networking capabilities. If you didn't create master and worker roles and service accounts, the worker nodes use the Secure Agent role.
  10. 10The worker nodes use the worker role to access the initialization script. If you didn't create master and worker roles and service accounts, the worker nodes use the Secure Agent role.
For more information about how the master and worker roles access cloud resources in an advanced cluster, see Step 7. Create roles and service accounts.

Running a job

To run a job, the Secure Agent, master node, and worker nodes access sources and targets, as well as the staging, log, and initialization script locations.
The following image shows the process that the Secure Agent and cluster nodes use to run the job:
The following steps describe the process that the Secure Agent and cluster nodes use to run the job:
  1. 1The worker nodes use the worker role to access source and target data.
  2. 2The Secure Agent uses the Secure Agent role to store job dependencies in the staging location.
  3. 3The worker nodes use the worker role to get job dependencies and stage temporary data in the staging location.
  4. 4The master node uses the master role to orchestrate processes on the cluster.
  5. 5The master node uses the master role to access and run the initialization script on the master node and to scale up the worker nodes. The added worker node uses the worker role to access the initialization script again to run the script on the worker node.
  6. 6The worker nodes use the worker role to store logs in the log location.
  7. 7The Secure Agent uses the Secure Agent role to upload the agent job log to the log location.
If you create master and worker roles and service accounts, the master and worker nodes use their respective roles. Otherwise, the master and worker nodes use the Secure Agent role.

Polling logs

When you use Monitor, the Secure Agent accesses the log location to poll logs.
To poll logs from the log location, the Secure Agent uses the permissions in the Secure Agent service account.

Learn about the Google Cloud cluster

The Google Cloud cluster uses the Google Cloud CentOS 7 OS image published by Informatica.
The OS image includes certain prebuilt packages and the following additional yum packages:
cloud-init
device-mapper-persistent-data
docker-ce
gnupg2
gzip
kernel-devel
kernel-headers
kubeadm
kubelet
libxml2-python
lvm2
tar
unzip
wget
yum-utils
The OS image also includes the following docker images:
calico/kube-controllers
calico/node
calico/cni
calico/pod2daemon-flexvol
coreos/flannel
coreos/flannel-cni
imega/jq
kube-scheduler