I am trying to set up an Azure Data Factory transformation. I have a SQL Server database with three tables: Students, StudentClasses and Classes. I would like to use Azure Data Factory to read these tables and create a JSON structure. The structure is an array of Students. A Student should have a derived field which is an array of Classes. StudentClasses is a mapping that tells which Classes were taken by a given student. It has two fields StudentId and ClassId. I think I should be able to add a derived column to Student that is an array of Classes created using the join transformation. I have gotten as far as setting up the data flows for each of the tables. However, I can't figure out how to embed the array of classes in student. I have tried adding a derived array field to Student. However, it is not clear how to populate it with the content I desire.
I can add a derived field called class to Class with the following expression:
array(classID, className)
The end result I am looking for is:
{
"students": [
{
"studentID": 1,
"studentName": "John Doe",
"classes": [
{
"classID": "CS101-01",
"className": "Introduction to Computer Science"
},
{
"classID": "CS102-01",
"className": "Mathematics for Computer Science"
}
]
},
{
"studentID": 2,
"studentName": "Jane Smith",
"classes": [
// ... (similar structure for Jane Smith)
]
}
// ... (additional students with their respective classes)
]
}
Array(classID, className)expression in derived column transformation will create an array ofClassID,ClassName. But this will not create as proper key name in key:value format created. Also, this will not be in the nested format under each student. To solve this, you can give expression likecollect(@(ClassId=ClassId,ClassName=ClassName))in the aggregate transformation. Below is the detailed approach.studentsandstudentsClassestable based on thestudentidfield. Then Join the resultant table withclassestable based onclassidfield. Then remove the duplicate columns which came as a result of join transformations usingselect transformation.Data Preview of select transformation after joining all three tables:
StudentIdandStudentNamein group by fields and in aggregates, create a new column namedClassesand give the expression ascollect(@(ClassId=ClassId,ClassName=ClassName))Data preview of aggregate transformation:
Students. Give the expression ascollect(@(StudentId=StudentId,StudentName=StudentName,Classes=Classes))Data preview of Aggragate2:
Output Json: