How can I extract data from cosmos db for a large json using query?

118 Views Asked by At

I need to extract certain information from a large json.

The json I have is as follows:

{
    "id": "abc",
    "doc1": {
        "documentName": "xyz",
        "documentUrl": "xyz",
        "documentType": ".xlsx",
        "contents": [
            {
                "Date": "2022-06-10T00:00:00",
                "Type": "Interaction",
                "Subject": "ABC",
                "Description": "My name is ABC."
            },
            {
                "Date": "2022-12-01T00:00:00",
                "Type": "Interaction",
                "Subject": "DEF",
                "Description": "I live in a town named DEF."
            },
            {
                "Date": "2023-03-15T00:00:00",
                "Type": "Interaction",
                "Subject": "IJK",
                "Description": "He is known as IJK."
            }
        ]
    },
    "doc2": {
        "documentName": "wyc",
        "documentUrl": "wyc",
        "documentType": ".xlsx",
        "contents": [
            {
                "Date": "2023-12-05T00:00:00",
                "Type": "Task",
                "Subject": "KLM",
                "Description": "She has a friend who is called as KLM.",
                "Status": "Completed"
            },
            {
                "Date": "2023-03-15T00:00:00",
                "Type": "Task",
                "Subject": "ROQ",
                "Description": "The dessert is named as ROQ.",
                "Status": "Completed"
            },
            {
                "Date": "2023-07-15T00:00:00",
                "Type": "Task",
                "Subject": "VDI",
                "Description": "We need to know the name of the school that VDI goes to.",
                "Status": "Open"
            }
        ]
    },
    "doc3": {
        "documentName": "ckl",
        "documentUrl": "ckl",
        "documentType": ".pdf",
        "contents": [
            {
                "pageNo": 1,
                "pageText": "Hi this place is known to have awesome desserts."
            },
            {
                "pageNo": 2,
                "pageText": "Hello World."
            },
            {
                "pageNo": 3,
                "pageText": "It is a beautiful day."
            },
            {
                "pageNo": 4,
                "pageText": "Sorry I think you have reached the wrong number."
            }
        ]
    }
}

I am trying to extract "Date", "Subject", "Description" from doc1, "Date", "Subject", "Description" and "Status" from doc2 and "documentUrl" and "pageText" (only from "pageNo" 2 and 3) from doc3.

I am expecting an output like this: (for doc1 - all the dates, subjects and descriptions)

[
  {
   "contents":[
      {
        "Date": "2022-06-10T00:00:00",
        "Subject": "ABC",
        "Description": "My name is ABC."
      },
      {
        "Date": "2022-12-01T00:00:00",
        "Subject": "DEF",
        "Description": "I live in a town named DEF."
      }
    ]
  }
]

Kind of trying to extract the same for all the other documents, including doc2 and doc3, and for doc3, just the pageNo 2 and 3

1

There are 1 best solutions below

1
On

You can see that I've set the array filter for pageNo 2 and 3 only. You can ofcourse alter this to have the data in their own fields instead of arrays like I have done below:

SELECT c.id, ARRAY(SELECT VALUE t.Date FROM t in c.doc1.contents) AS doc1Date,ARRAY(SELECT VALUE t.Subject FROM t in c.doc1.contents) AS doc1Subject, ARRAY(SELECT VALUE t.Description FROM t in c.doc1.contents) AS doc1Description,ARRAY(SELECT VALUE t.Date FROM t in c.doc2.contents) AS doc2Date, ARRAY(SELECT VALUE t.Status FROM t in c.doc2.contents) AS doc2Status, c.doc3.documentUrl AS doc3url, ARRAY(SELECT VALUE t.pageText FROM t in c.doc3.contents WHERE t["pageNo"] IN (2, 3)) AS doc3pageText FROM c WHERE c.id = "abc" 

It will give a response like below.

[
    {
        "id": "abc",
        "doc1Date": [
            "2022-06-10T00:00:00",
            "2022-12-01T00:00:00",
            "2023-03-15T00:00:00"
        ],
        "doc1Subject": [
            "ABC",
            "DEF",
            "IJK"
        ],
        "doc1Description": [
            "My name is ABC.",
            "I live in a town named DEF.",
            "He is known as IJK."
        ],
        "doc2Date": [
            "2023-12-05T00:00:00",
            "2023-03-15T00:00:00",
            "2023-07-15T00:00:00"
        ],
        "doc2Status": [
            "Completed",
            "Completed",
            "Open"
        ],
        "doc3url": "ckl",
        "doc3pageText": [
            "Hello World.",
            "It is a beautiful day."
        ]
    }
]