I need to extract certain information from a large json.
The json I have is as follows:
{
"id": "abc",
"doc1": {
"documentName": "xyz",
"documentUrl": "xyz",
"documentType": ".xlsx",
"contents": [
{
"Date": "2022-06-10T00:00:00",
"Type": "Interaction",
"Subject": "ABC",
"Description": "My name is ABC."
},
{
"Date": "2022-12-01T00:00:00",
"Type": "Interaction",
"Subject": "DEF",
"Description": "I live in a town named DEF."
},
{
"Date": "2023-03-15T00:00:00",
"Type": "Interaction",
"Subject": "IJK",
"Description": "He is known as IJK."
}
]
},
"doc2": {
"documentName": "wyc",
"documentUrl": "wyc",
"documentType": ".xlsx",
"contents": [
{
"Date": "2023-12-05T00:00:00",
"Type": "Task",
"Subject": "KLM",
"Description": "She has a friend who is called as KLM.",
"Status": "Completed"
},
{
"Date": "2023-03-15T00:00:00",
"Type": "Task",
"Subject": "ROQ",
"Description": "The dessert is named as ROQ.",
"Status": "Completed"
},
{
"Date": "2023-07-15T00:00:00",
"Type": "Task",
"Subject": "VDI",
"Description": "We need to know the name of the school that VDI goes to.",
"Status": "Open"
}
]
},
"doc3": {
"documentName": "ckl",
"documentUrl": "ckl",
"documentType": ".pdf",
"contents": [
{
"pageNo": 1,
"pageText": "Hi this place is known to have awesome desserts."
},
{
"pageNo": 2,
"pageText": "Hello World."
},
{
"pageNo": 3,
"pageText": "It is a beautiful day."
},
{
"pageNo": 4,
"pageText": "Sorry I think you have reached the wrong number."
}
]
}
}
I am trying to extract "Date", "Subject", "Description"
from doc1, "Date", "Subject", "Description" and "Status"
from doc2 and "documentUrl" and "pageText"
(only from "pageNo" 2 and 3
) from doc3.
I am expecting an output like this: (for doc1 - all the dates, subjects and descriptions)
[
{
"contents":[
{
"Date": "2022-06-10T00:00:00",
"Subject": "ABC",
"Description": "My name is ABC."
},
{
"Date": "2022-12-01T00:00:00",
"Subject": "DEF",
"Description": "I live in a town named DEF."
}
]
}
]
Kind of trying to extract the same for all the other documents, including doc2 and doc3, and for doc3, just the pageNo 2 and 3
You can see that I've set the array filter for pageNo 2 and 3 only. You can ofcourse alter this to have the data in their own fields instead of arrays like I have done below:
It will give a response like below.