I'm trying to find how I can search for AND update a value inside an array of JSON objects, where the key and value I'm looking for, can be at any different levels.
I've created a dbfiddle with some sample data. To make it easier to read the JSON data, here's a gist with a pretty format.
So, let's take this in parts:
What am I trying to achieve?
I'm trying to replace the value ofurlin themedia_itemstable, and the values ofurlandoriginalUrlin thedocuments_revisionstable with new values, for a server migration.urlwill point to my server andoriginal_urlwill point to an AWS S3 bucket.
Let's look at this media_item entry example:
// media_item simplified example...
{
"key": "2021/1/29/04aaf6f1-bf13-4918-a062-355bdf192d2f.jpeg",
"url": "https://my.old.url/api/v1/images/4166289-f-542-d-445-b-a-0-d-1-ec-7916-c-0748-e.jpeg?id=MQkvMjAyMi8wMS8yMC80MTY2Mjg5Zi01NDJkLTQ0NWItYTBkMS1lYzc5MTZjMDc0OGUuanBlZwk4OTY%3D",
}
This key is the actual path/URL to the resource on my server and s3 bucket, and url allows me to get the "original" value, which is then used in the documents/revisions in document_revisions. The "original" url also contains a base64 encoded id param, which is just the key as a base64 string.
How am I trying to do it?
The current approach I'm taking, has me:
read the values of
urlandkeyfrom themedia_itemstable (in the fiddle) into aitem recordvariable (which then get iterated upon).I then load the
urlvalue into aold_urlvariable (to re-use it during "searches")
old_url := "https://my.old.url/api/v1/images/4166289-f-542-d-445-b-a-0-d-1-ec-7916-c-0748-e.jpeg?id=MQkvMjAyMi8wMS8yMC80MTY2Mjg5Zi01NDJkLTQ0NWItYTBkMS1lYzc5MTZjMDc0OGUuanBlZwk4OTY%3D"
- create a
new_urlvariable with the new value, which is something like:
new_url := CONCAT('https://my.new.url', '/', '"' , item.key, '"')
-- output: https://my.new.url/2021/1/29/04aaf6f1-bf13-4918-a062-355bdf192d2f.jpeg
- With these variables, I'll update the
media_items.asset->'url'entry tonew_urlAND look for the value ofold_urlin ALL revisions indocument_revisions:
do $$
-- (...)
-- Update the media_items asset->'url'
UPDATE media_items mle
SET asset = REPLACE(
(mle.asset)::TEXT,
CONCAT('"url": ', old_url
CONCAT('"url": ', '"', new_url, '"')
)::JSONB;
-- Update the document_revisions
UPDATE document_revisions dr
SET data = REPLACE(
(dr.data)::TEXT,
CONCAT('"url": ', old_url),
CONCAT('"url": ', '"', new_url, '"')
)::JSONB;
UPDATE document_revisions dr
SET data = REPLACE(
(dr.data)::TEXT,
CONCAT('"originalUrl": ', old_url),
CONCAT('"originalUrl": ', '"', new_url, '"')
)::JSONB;
-- (...)
end; $$;
Now, as you can see, this is not a very good/performant approach as I'm relying on parsing JSON to text, finding a certain substring (which represents a JSON key: value) and then replacing it with the new value.
It is, however, fairly simple to target the url key in the media_items table, so I could simply change to a more directed approach:
-- A much better solution to the media_items update above
UPDATE
media_items mle
SET
asset = JSONB_SET(asset, '{url}', CONCAT('"', new_url, '"')::JSONB)
WHERE
item.id = mle.id;
On the other hand, the same is NOT true for the revisions in document_revisions, as the url and originalUrl can be at any depth level of json objects in the document_revisions.data->'data'->'content' item array (look at the sample data in the gist, it's pretty complete even though text components have been severely shrunk of text for the sake of readability). I'm not sure how to get the array index of the json object to update and update it's property/key
I did find a couple possible solutions that used jsonb operators, but I'm not sure how to use them to achieve my intended result:
-- This will return all the revisions which contain an object, at any depth, whose key is `url`.
SELECT dr.*
FROM document_revisions dr
WHERE (dr.data->'content')::jsonb @? 'strict $.** ? (exists (@."url"))';
-- This will return all the revisions that contain an json value equal to the defined url
SELECT dr.*
FROM document_revisions dr
WHERE (dr.data->'content')::JSONB @? 'strict $.** ? (@ == "https://my.old.url/api/v1/images/covid-19-m-rna-vaccines.png?id=MQkvMjAyMS82LzEvZWI2YzQwZjItMjYzMS00MTY4LWFhZGQtNjZjMmVkOTg5ZDQ2LnBuZwk4MTQ=")';
I tried combining the two options (meaning, find by key AND value) with something to the likes of the next example, but that doesn't work, nor have I found an iteration of this that works:
SELECT dr.*
FROM document_revisions dr
WHERE (dr.data->'content')::jsonb @? 'strict $.** ? (exists (@."url") && @ == "https://server.livingdocs.io/api/v1/images/a.jpg?id=MQkvMjAyMC8xMS8yNS9mNjgyN2E5ZC0xZjViLTQzZGUtOGNiNy1iY2RhNjgyYTFhMjMuanBlZwk4OTY=")';
The only simple way I've found so far to do this is by doing string replacements, but this is slow and ineffective. It took me over 2 hours to update 150 media assets, which can then correspond to any amount of revisions between 1 and N. I currently have 6k assets and 23k revisions, where each revision can use N assets, so it's not being good...
Now, my questions are:
- Is there a way to update in-place, a given json key/property at any index and level of an array?
- Is there a way to speed up the string replacement?
- How would you do this? I've tried creating temporary tables, unlogged tables, creating unindexed copies of the tables, and nothing works... Help please?
Thanks
EDIT:
Based on @Ajax1234 answer (thank you) the final solution ended up being a set of functions that are available in this dbfiddle
A possible method for handling arbitrarily nested data is to find all the JSON paths in
document_revisions.datathat point to the value ofurlororiginalUrlkeys and then use a function with aloopto repeatedly updatedocument_revisionsin place.First, update
media_items:Then, create a
ctethat will return all paths for every value in eachdata:Lastly, create a function (this can also be done in a procedure) to perform the updates:
Explanation of
updatequery:See fiddle
A more memory efficient approach can be to find all
urlandoriginalUrls fromdocument_revisionsand then perform an in-place series of updates in a function. In this way,datais not being repeatedly loaded into memory all at once, as it is for the solution above.First, update
media_items:Then, using a procedure to find all
urls indocument_revisionsand then performing the replacements:See fiddle
As with the previous solutions, a replacement of
urlororiginalUrlwithmedia_items.new_urlonly occurs when the latter matches as a substring of the former: