Before you set up your environment, verify the requirements for your environment and your cloud platform.
Complete the following tasks:
•Verify that you have the correct privileges in your organization.
•Verify that you have the necessary Microsoft Azure products.
•Learn how the Secure Agent and the advanced cluster access resources on your cloud platform.
Verify privileges in your organization
Verify that you are assigned the correct privileges for advanced configurations in your organization.
Privileges for advanced configurations provide you varying access levels to the Advanced Clusters page in Administrator as well as Monitor.
You must have at least the read privilege to view the advanced configurations and to monitor the advanced clusters.
Verify Microsoft Azure products
Verify that you have the necessary Microsoft Azure products to create an advanced cluster in an Azure environment.
You must have the following products on your Azure account:
Azure Data Lake Storage Gen2
Staging data and log files for an advanced cluster and jobs are stored on the Azure cloud.
Linux Virtual Machines
A Linux virtual machine hosts the Secure Agent.
Virtual Network (VNet)
An advanced cluster is created in a VNet. You can specify an existing VNet, or the Secure Agent can create a VNet based on the region that you provide.
Key Vault
If you create a service principal to perform cluster operations, a key vault stores the service principal credentials. The Secure Agent accesses the key vault to retrieve the credentials.
Load Balancer
A load balancer accepts incoming jobs from a Secure Agent and provides an entry point for the jobs to an advanced cluster.
Learn about resource access
To process data, the Secure Agent and the advanced cluster access the resources that are part of an advanced job, including resources on the cloud platform, source and target data, and staging and log locations.
Resources are accessed to perform the following tasks:
•Design a mapping
•Create an advanced cluster
•Run a job, including data preview
•Poll logs
Designing a mapping
When you design a mapping, the Secure Agent accesses sources and targets so that you can read and write data.
For example, when you add a Source transformation to a mapping, the Secure Agent accesses the source to display the fields that you can use in the rest of the mapping. The Secure Agent also accesses the source when you preview data.
To access a source or target, the Secure Agent uses the connection properties. For example, the Secure Agent might use the user name and password that you provide in the connection properties to access a database.
Creating an advanced cluster
To create an advanced cluster, the Secure Agent authenticates with the managed identity to store cluster details in the staging location and to create the cluster. The master and worker nodes use the service principal to access cloud resources.
The following image shows the process that the Secure Agent uses to create a cluster:
The following steps describe the process that the Secure Agent uses to create a cluster:
1You run a job.
2The Secure Agent authenticates with the managed identity to store cluster details in the staging location.
3The Secure Agent authenticates with the managed identity to create prerequisite resources that the cluster needs, such as a network security group and load balancer.
4The Secure Agent authenticates with the managed identity to get the access keys to the storage accounts.
5The Secure Agent authenticates with the managed identity to get the service principal credentials.
6The Secure Agent makes the access keys to the storage accounts and the service principal credentials available to the cluster.
7The Secure Agent authenticates with the managed identity to create cluster resources for the master node and a Virtual Machine Scale Set for the master node.
8The master node uses the service principal to access cloud resources on services on Microsoft Azure like Azure Compute to manage node elasticity and resource optimization.
9The master node accesses the initialization script using the storage account key that the Secure Agent fetched through the managed identity.
10The Secure Agent authenticates with the managed identity to create cluster resources for the worker nodes and creates a Virtual Machine Scale Set with the minimum number of worker nodes.
11The worker nodes use the service principal to access cloud resources on services on Microsoft Azure like Azure Compute to access compute and networking capabilities.
12The worker nodes access the initialization script using the storage account key that the Secure Agent fetched through the managed identity.
To run a job, the Secure Agent and the worker nodes access sources and targets, as well as the staging and log locations. The worker nodes and the Azure disks auto-scale according to resource requirements.
The following image shows the process that the Secure Agent and worker nodes use to run the job:
The following steps describe the process that the Secure Agent and worker nodes use to run the job:
1The worker nodes use the connection properties to access source and target data.
The connection properties access the data either using a storage account key or a managed identity. To use a managed identity, the identity must be assigned to the Secure Agent, and the agent role must have permissions to detect all user-assigned managed identities that are assigned to the Secure Agent machine, and be able to assign the identities to all cluster nodes.
2The Secure Agent authenticates with the managed identity to store job dependencies in the staging location.
3The worker nodes get job dependencies and stage temporary data in the staging location using the storage account key that the Secure Agent fetched through the managed identity. The Secure Agent also passes the key to the Spark job so that the Spark driver and Spark executors can use the same key to access the staging location.
4The worker nodes and the Azure disks auto-scale using the service principal.
5The worker nodes store logs in the log location after fetching the storage account key through the managed identity.
6The Secure Agent authenticates with the managed identity to upload the agent job log to the log location.
Polling logs
When you use Monitor, the Secure Agent accesses the log location to poll logs.
To poll logs from the log location, the Secure Agent uses the permissions in the managed identity that is assigned to the Secure Agent machine.