You're a marketing manager for an online retailer, and you want to merge order data with product and customer data from different Amazon S3 sources to understand what customers are purchasing. Use Joiner transformations to join the data from your sources.
You have three source data tables in an Amazon S3 bucket: orders, order_items, and customer_data. The following image shows a mapping that joins the data from these sources:
The mapping contains the following elements:
Source transformation for orders
The orders data table includes fields for the order number, date, price, and ID of the customer for each online order.
The following table shows a portion of orders:
order_id
order_date
customer_id
order_price
1005
2023-01-20
789
78.25
1006
2023-01-24
268
150.09
1007
2023-02-07
268
30.20
Source transformation for order_items
The order_items data table includes details about the items in each order, including the quantity and price.
The following table shows a portion of order_items:
order_id
item_id
qty
price
1005
5063
2
34.99
1006
2389
3
19.99
1006
5063
1
34.99
1007
9871
2
10.99
In the Source transformation, you rename the field order_id to items_order_id to avoid a field name conflict when you join order_items with orders.
Source transformation for customer_data
The customer_data table includes fields for information that the customers provide, including their name, date of birth, and phone number.
The following table shows a portion of customer_data:
c_id
c_name
c_dob
789
Kelcy Almeida
1969-07-20
268
Chidi Donalds
1972-12-07
Joiner transformation for orders and order_items
The first Joiner transformation performs a normal join between orders and order_items. The orders Source transformation is the master group and the order_items Source transformation is the detail group so that order information is added to each item ordered.
The Joiner transformation uses the following join condition to match the data by the order ID: order_id = items_order_id.
Joiner transformation for customer_data
The second Joiner transformation performs a detail outer join between customer_data and the output from the first Joiner transformation. The transformation uses the customer_data Source transformation as the master group since it is the smaller data set.
The second Joiner transformation uses the following join condition to match the data by the customer ID: customer_id = c_id.
Target transformation
The Target transformation writes the data to a new file in Amazon S3. You can configure the incoming fields to exclude the duplicate fields that result from the joins.
The following table shows a portion of the output data: