I am trying to set up an Azure Data Factory transformation. I have a SQL Server database with three tables: Students, StudentClasses and Classes. I would like to use Azure Data Factory to read these tables and create a JSON structure. The structure is an array of Students. A Student should have a derived field which is an array of Classes. StudentClasses is a mapping that tells which Classes were taken by a given student. It has two fields StudentId and ClassId. I think I should be able to add a derived column to Student that is an array of Classes created using the join transformation. I have gotten as far as setting up the data flows for each of the tables. However, I can't figure out how to embed the array of classes in student. I have tried adding a derived array field to Student. However, it is not clear how to populate it with the content I desire.
I can add a derived field called class to Class with the following expression:
array(classID, className)
The end result I am looking for is:
{
"students": [
{
"studentID": 1,
"studentName": "John Doe",
"classes": [
{
"classID": "CS101-01",
"className": "Introduction to Computer Science"
},
{
"classID": "CS102-01",
"className": "Mathematics for Computer Science"
}
]
},
{
"studentID": 2,
"studentName": "Jane Smith",
"classes": [
// ... (similar structure for Jane Smith)
]
}
// ... (additional students with their respective classes)
]
}
Array(classID, className)
expression in derived column transformation will create an array ofClassID
,ClassName
. But this will not create as proper key name in key:value format created. Also, this will not be in the nested format under each student. To solve this, you can give expression likecollect(@(ClassId=ClassId,ClassName=ClassName))
in the aggregate transformation. Below is the detailed approach.students
andstudentsClasses
table based on thestudentid
field. Then Join the resultant table withclasses
table based onclassid
field. Then remove the duplicate columns which came as a result of join transformations usingselect transformation
.Data Preview of select transformation after joining all three tables:
StudentId
andStudentName
in group by fields and in aggregates, create a new column namedClasses
and give the expression ascollect(@(ClassId=ClassId,ClassName=ClassName))
Data preview of aggregate transformation:
Students
. Give the expression ascollect(@(StudentId=StudentId,StudentName=StudentName,Classes=Classes))
Data preview of Aggragate2:
Output Json: