How to load a JSON file into Postgres with all of its keys?

889 Views Asked by At

I have a XML file, which has been broken down into smaller tables. I can load them to Navicat all fine, except for one table. Here's the XML structure:

<Food> 
  <Id> 100 </Id>
  <Type> Meat </Type>
  <Expiry Date>
    <Chicken>
      2020/12/20
    </Chicken>
    <Beef>
      2020/12/25
    </Beef>
  </Expiry Date>
</Food>

<Food>
  <Id> 200 </Id>
  <Type> Vegetables </Type>
  <Nutrition> B1 </Nutrition>
</Food>

I have turned it into JSON, using xmltodict in Python:

[{
"Id": "100",
"Type": "Meat",
"Expiry Date": {
  "Chicken": "2020/12/20",
  "Beef": "2020/12/25"
  }
},

{
"Id": "200",
"Type": "Vegetables",
"Nutrition": "B1"
}]

However when I load this JSON file onto Navicat (PostgresSQL connection), the SQL table schema only has Id, Type, and Expiry Date. As you can see, there are keys missing in one object but appear in other ones. How can I create a SQL table that has all fields from the JSON file? (Id, Type, Expiry Date, AND Nutrition).

2

There are 2 best solutions below

5
On

If you do not have a special reason to first turn it into JSON then you may use XMLTABLE with the 'original' embedded XML like this:

select * 
from xmltable ( '//Food' passing 
    xmlparse (document '<dummyRoot>
    <Food> 
      <Id> 100 </Id>
      <Type> Meat </Type>
      <ExpiryDate>
        <Chicken>
          2020/12/20
        </Chicken>
        <Beef>
          2020/12/25
        </Beef>
      </ExpiryDate>
    </Food>
    <Food>
      <Id> 200 </Id>
      <Type> Vegetables </Type>
      <Nutrition> B1 </Nutrition>
    </Food>
    </dummyRoot>')
  columns 
   "Id" integer,
   "Type" text, 
   "ExpiryDate.Chicken" date path 'ExpiryDate/Chicken',
   "ExpiryDate.Beef" date path 'ExpiryDate/Beef',
   "Nutrition" text
);

<Expiry Date> and </Expiry Date> need to be changed to <ExpiryDate> and </ExpiryDate> respectively to become valid tag names. This is the result:

Id |Type        |ExpiryDate.Chicken|ExpiryDate.Beef|Nutrition|
---|------------|------------------|---------------|---------|
100| Meat       |        2020-12-20|     2020-12-25|         |
200| Vegetables |                  |               | B1      |

Edit Simplified XML query

If JSON is needed then as suggested by Laurenz Albe:

select 
    (j->>'Id')::integer id, 
    j->>'Type' "type", 
    (j->'Expiry Date'->>'Chicken')::date xdate_chicken, 
    (j->'Expiry Date'->>'Beef')::date xdate_beef,
    j->>'Nutrition' nutrition
from jsonb_array_elements
('[{
"Id": "100",
"Type": "Meat",
"Expiry Date": {
  "Chicken": "2020/12/20",
  "Beef": "2020/12/25"
  }
},
{
"Id": "200",
"Type": "Vegetables",
"Nutrition": "B1"
}]') j;
0
On

I have found the solution for this. Using Python, I read through the XML file to get all possible column names, then inside each column name, I read into its subelements and write it as jsonb in Postgres.