Is it safe to select multiple columns with jsonb_array_elements in a single SELECT statement? Is there a guarantee that the order of the expanded elements in one column is the same as the order in the second column?
Example: My table contains a column data that contains a json array. Each element of the array is an object with two properties (id and name):
data
--------------------------------------------------------------------
[{"id": "11", "name": "entry11"}, {"id": "12", "name": "entry12"}]
[{"id": "21", "name": "entry21"}, {"id": "22", "name": "entry22"}]
If I run
with my_table(data) as (values
('[{"id": "11", "name":"entry11"},{"id": "12", "name": "entry12"}]'::jsonb),
('[{"id": "21", "name":"entry21"},{"id": "22", "name": "entry22"}]'::jsonb)
)
select
jsonb_array_elements(data)->'id' as id,
jsonb_array_elements(data)->'name' as name
from my_table
;
I get the expected result:
id | name
------+-----------
"11" | "entry11"
"12" | "entry12"
"21" | "entry21"
"22" | "entry22"
My Question: Is there a risk that the name entry22 could end up in the row with id 21 as the two invocations of jsonb_array_elements are handled independently by the database? My experiments (also with larger tables) suggest that it always works. But as relational databases usually don't have a stable ordering of rows I wonder if I can rely on that result.
Yes, there is a guarantee. But it's a weak one.
idandnamewill stay in sync (come from the same nested object) because the two set-returning functions are evaluated in "lockstep". The manual:Bold emphasis mine.
Since both your calls are guaranteed to return the same number of rows, this even works reliably before Postgres 10 (where the behavior of multiple set-returning functions in the
SELECTlist was reformed (sanitized). See:Unlike
json,jsonbhas a deterministic sort order of nested elements. But the only relevant aspect here is the order of array items, and that is always significant and preserved injsonandjsonbalike.Optimize query
Elaborating on what we just learned about the internal workings, your query:
... is equivalent to:
Makes it even more obvious, that it should be simplified to a single function call in a subquery. Then fields cannot get out of sync to begin with:
Or, with minimal syntax:
fiddle
When returning multiple, separate fields from the same function call, move the function call to a subquery as demonstrated. Else you risk (costly) repeated evaluation. See:
True issue with (multiple) SRF in the
SELECTlistAll of the above harbor a trap. If a SRF in the
SELECTlist does not return a row (like whenjsonb_array_elements()finds an empty array), the row is removed from the result. When there are multiple SRF in theSELECTlist, the row is removed when all of them come up empty. This can lead to surprising results. From my experience, few are aware of the subtle implications.If that side effect is intended, it's much clearer spelled out as
CROSS JOINas demonstrated above.If that side effect is not intended, and you'd rather preserve all rows, use
LEFT JOIN LATERAL ... ON trueinstead:fiddle
Use this query, unless you know better. See:
Aside:
jsonb_populate_recordset()for this particular queryThere is a more efficient way for your particular query with
jsonb_populate_recordset(): faster, and with implicit type casts.Create a fitting composite type once if you don't have one:
Then:
fiddle
Related: