Suppose I have a table with 3 Columns for the month of January
Name | Start Date | Items Purchased |
---|---|---|
A | 01/01 | 2 |
B | 01/01 | 4 |
C | 01/01 | 5 |
D | 01/01 | 7 |
E | 01/01 | 1 |
I will receive two similar datasets for the month of Feb and March
FEB
Name | Start Date | Items Purchased |
---|---|---|
A | 01/01 | 1 |
C | 01/01 | 4 |
D | 01/01 | 2 |
G | 01/02 | 1 |
H | 01/02 | 6 |
MARCH
Name | Start Date | Items Purchased |
---|---|---|
B | 01/01 | 6 |
C | 01/01 | 4 |
G | 01/02 | 2 |
H | 01/02 | 3 |
J | 01/03 | 1 |
Now, I will receive similar datasets for subsequent months and hence I want to automate the whole process. In my final Dataset I want to keep one column each for Name and Start Date. But I want three different columns of Items Purchased for each month.
Name | Start Date | Items Purchased Jan | Items Purchased Feb | Items Purchased Mar |
---|---|---|---|---|
A | 01/01 | 2 | 1 | NA |
B | 01/01 | 4 | NA | 6 |
C | 01/01 | 5 | 4 | 4 |
D | 01/01 | 7 | 2 | NA |
E | 01/01 | 1 | NA | NA |
G | 01/02 | NA | 1 | 2 |
H | 01/02 | NA | 6 | 3 |
J | 01/03 | NA | NA | 1 |
Is it possible to achieve the result(with automation) as shown in the above table using AWS Databrew or any other AWS services?
Not sure if you have got a solution already. Within AWS DataBrew, you can schive this in two steps. Once new columns are created, you can populate the data as required based on your logic.