Connections > Databricks connection properties > SQL warehouse
  

SQL warehouse

Configure either the AWS or Azure staging environment for the SQL warehouse based on the deployed environment. You also need to configure the Spark parameters for the SQL warehouse to use Azure and AWS staging.
You can use a SQL warehouse on the Windows and Linux operating systems.
For more information on the types of SQL warehouses that you can connect to, see the Databricks SQL warehouses Knowledge Base article.

Configure AWS staging

Configure IAM AssumeRole authentication to use AWS staging for the SQL warehouse.

IAM AssumeRole authentication

You can enable IAM AssumeRole authentication in Databricks for secure and controlled access to the Amazon S3 staging bucket when you run mappings and mapping tasks.
You can configure IAM authentication when the Secure Agent runs on an Amazon Elastic Compute Cloud (EC2) system. When you use a serverless runtime environment, you cannot configure IAM authentication.
Note: Data Ingestion and Replication does not support IAM authentication for access to Amazon S3 staging.
Perform the following steps to configure IAM authentication on EC2:
  1. 1Create a minimal Amazon IAM policy.
  2. 2Create the Amazon EC2 role. The Amazon EC2 role is used when you create an EC2 system.
  3. For more information about creating the Amazon EC2 role, see the AWS documentation.
  4. 3Link the minimal Amazon IAM policy with the Amazon EC2 role.
  5. 4Create an EC2 instance. Assign the Amazon EC2 role that you created to the EC2 instance.
  6. 5Install the Secure Agent on the EC2 system.

Temporary security credentials using AssumeRole

You can use temporary security credentials using AssumeRole to access AWS resources from same or different AWS accounts.
Note: Data Ingestion and Replication does not support using temporary security credentials for IAM users.
Ensure that you have the sts:AssumeRole permission and a trust relationship established within the AWS accounts to use temporary security credentials. The trust relationship is defined in the trust policy of the IAM role when you create the role. The IAM role adds the IAM user as a trusted entity allowing the IAM users to use temporary security credentials and access AWS accounts.
For more information about how to establish the trust relationship, see the AWS documentation.
When the trusted IAM user requests for temporary security credentials, the AWS Security Token Service (AWS STS) dynamically generates the temporary security credentials that are valid for a specified period and provides the credentials to the trusted IAM users. The temporary security credentials consist of access key ID, secret access key, and secret token.
To use the dynamically generated temporary security credentials, provide a value for the IAM Role ARN connection property when you create a Databricks connection. The IAM Role ARN uniquely identifies the AWS resources. Then, specify the time duration in seconds during which you can use the temporarily security credentials in the Temporary Credential Duration advanced source and target properties.

External ID

You can specify the external ID for a more secure cross-account access to the Amazon S3 bucket when the Amazon S3 bucket is in a different AWS account.
Optionally, you can specify the external ID in the AssumeRole request to the AWS Security Token Service (STS).
The external ID must be a string.
The following sample shows an external ID condition in the assumed IAM role trust policy:
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::AWS_Account_ID : user/user_name"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:ExternalId": "dummy_external_id"
}
}
}
]
Note: Data Ingestion and Replication does not support External ID.

Temporary security credentials policy

To use temporary security credentials to access AWS resources, both the IAM user and IAM role require policies.
Amazon S3 permission policy
Attach the following S3 permission policy to allow access to the Amazon S3 bucket:
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"s3:DeleteObject",
"s3:GetObject",
"s3:ListBucket",
"s3:PutObject",
"s3:PutObjectTagging",
"s3:GetBucketAcl"
],
"Effect": "Allow",
"Resource": "arn:aws:s3:::com.amk"
},
{
"Effect": "Allow",
"Action": [
"s3:DeleteObject",
"s3:GetObject",
"s3:ListBucket",
"s3:PutObject",
"s3:PutObjectTagging",
"s3:GetBucketAcl"
],
"Resource": "arn:aws:s3:::com.amk/*"
}
]
}
The following section lists the policies required for IAM user and IAM role:
IAM user
An IAM user must have the sts:AssumeRole policy to use temporary security credentials in same or different AWS account.
The following sample policy allows an IAM user to use the temporary security credentials in an AWS account:
{
"Version":"2012-10-17",
"Statement":{
"Effect":"Allow",
"Action":"sts:AssumeRole",
"Resource":"arn:aws:iam::<ACCOUNT-HYPHENS>:role/<ROLE-NAME>" }
}
IAM role
An IAM role must have the sts:AssumeRole policy and a trust policy attached with the IAM role to allow the IAM user to access the AWS resource using temporary security credentials. The policy specifies the AWS resource that the IAM user can access and the actions that the IAM user can perform. The trust policy specifies the IAM user from the AWS account that can access the AWS resource.
The following policy is a sample trust policy:
{
"Version":"2012-10-17",
"Statement":[
{
"Effect":"Allow",
"Principal":{ "AWS":"arn:aws:iam::AWS-account-ID:root" },
"Action":"sts:AssumeRole"
}
]
}
}
Here, in the Principal attribute, you can also provide the ARN of IAM user who can use the dynamically generated temporary security credentials and to restrict further access. For example,
"Principal" : { "AWS" : "arn:aws:iam:: AWS-account-ID :user/ user-name " }

Temporary security credentials using AssumeRole for EC2

You can use temporary security credentials using AssumeRole for an Amazon EC2 role to access AWS resources from same or different AWS accounts.
The Amazon EC2 role would be able to assume another IAM Role from the same or a different AWS account without requiring the permanent access key and secret key.
Consider the following prerequisites when you use temporary security credentials using AssumeRole for EC2:
To configure an EC2 role to assume the IAM role provided in the IAM Role ARN connection property, select the Use EC2 Role to Assume Role check box in the connection properties.

Create a minimal Amazon IAM policy

To stage the data in Amazon S3, use the following minimum required permissions: :
You can use the following sample Amazon IAM policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:DeleteObject",
"s3:ListBucket",
"s3:ListBucketMultipartUploads"
],
"Resource": [
"arn:aws:s3:::<bucket_name>/*",
"arn:aws:s3:::<bucket_name>"
]
}
]
}
For mappings in advanced mode, you can use different AWS accounts within the same AWS region. Make sure that the Amazon IAM policy confirms access to the AWS accounts used in these mappings.
Note: The Test Connection does not validate the IAM policy assigned to users. You can specify the Amazon S3 bucket name in the source and target advanced properties.
This information does not apply to Data Ingestion and Replication.

Configure Spark parameters for AWS staging

Before you use the Databricks SQL warehouse to run mappings, configure the Spark parameters for SQL warehouse on the Databricks SQL Admin console.
On the Databricks SQL Admin console, navigate to SQL Warehouse Settings > Data Security, and then configure the Spark parameters for AWS under Data access configuration.
Add the following Spark configuration parameters and restart the SQL warehouse:
For example, the S3 staging bucket warehouse value is s3.ap-south-1.amazonaws.com.
Ensure that the configured access key and secret key have access to the S3 buckets where you store the data for Databricks tables.

Configure Azure staging

Before you use Microsoft Azure Data Lake Storage Gen2 to stage files, perform the following tasks:

Configure Spark parameters for Azure staging

Before you use the Databricks SQL warehouse to run mappings, configure the Spark parameters for SQL warehouse on the Databricks SQL Admin console.
On the Databricks SQL Admin console, navigate to SQL Warehouse Settings > Data Security, and then configure the Spark parameters for Azure under Data access configuration.
Add the following Spark configuration parameters and restart the SQL warehouse:
Ensure that the configured client ID and client secret have access to the file systems where you store the data for Databricks tables.