You work for a retail company that offers more than 50,000 products and the stores are distributed across the globe. The company ingests a large amount of customer engagement details from the transactional CRM system into Amazon S3.
The sales team wants to improve customer engagement and satisfaction at every touch point. To create a seamless customer experience and deliver personalized service across the various outlets, the retail company plans to load the data that is stored in the Amazon S3 bucket to Databricks.
You can create a mapping that runs on an advanced cluster to achieve faster performance when you read data from the Amazon S3 bucket and write data to the Databricks target.
You can choose to add transformations to process the raw data that you read from the Amazon S3 bucket and then write the curated data to Databricks.
The following example illustrates how to create a mapping in advanced mode to read from an Amazon S3 source and write to Databricks target:
1In Data Integration, click New > Mappings > Mapping.
2In the Mapping Designer, click Switch to Advanced.
The Mapping Designer updates the mapping canvas to display the transformations and functions that are available in advanced mode.
3 Enter a name, location, and description for the mapping.
4Add a Source transformation, and specify a name and description in the general properties.
5On the Source tab, perform the following steps to read data from the Amazon S3 source:
aIn the Connection field, select the Amazon S3 V2 connection.
bIn the Source Type field, select single object as the source type.
cIn the Object field, select the parquet file object that contains the customer details.
dIn the Advanced Properties section, specify the required parameters.
6On the Expression tab, define an expression to change the file name port of the customer parquet file to uppercase based on your business requirement before you write data to the Databricks target.
7Add a Target transformation, and specify a name and description in the general properties.
8On the Target tab, specify the details to write data to Databricks:
aIn the Connection field, select the Databricks target connection.
bIn the Target Type field, select single object.
cIn the Object field, select the Databricks object to which you want to write the curated customer engagement data.
dIn the Operation field, select the insert operation.
eIn the Advanced Properties section, specify the required advanced target properties.
9Click Save > Run to validate the mapping.
In Monitor, you can monitor the status of the logs after you run the task.