Postgres, searching with ILIKE operator from json array works different, than from regular table

40 Views Asked by At

In our DB we have tables, that have 6-8 M records, this data is static and in order of performance optimization we decided to store some data from this tables in separate table, and for predefined filters use the values from this table. For example we want to store distinct states, cities, tags in this table.

The example of data could be represented in such test table:

CREATE TABLE test_data_table (
                                 id    BIGSERIAL PRIMARY KEY,
                                 state  text,
                                 city text
);

INSERT INTO test_data_table (state, city)
    values  ('MD', 'Union Bridge'),
            ('IL', 'Bourbonnais'),
            ('NC', 'Rdelk'),
            ('FL', 'Bonita Springs'),
            ('FL', 'Unit Ave Maria'),
            ('AZ', 'N Placita Chula Vista Tucson'),
            ('TX', 'Sienna'),
            ('LA', 'Lornager'),
            ('CA', 'Empire'),
            ('MA', 'Middleton'),
            ('CO', 'Yampa'),
            ('GA', 'Spr Project Name Unit'),
            ('IN', 'Greencastle'),
            ('NV', 'Flora'),
            ('RI', 'Coventry'),
            ('WA', 'Rice'),
            ('FL', 'Orange City'),
            ('TX', 'Haltom City'),
            ('CA', 'Moraga'),
            ('LA', 'Belle Chasse'),
            ('AZ', 'E Rincon Creek Ranch Rd Tucson'),
            ('CA', 'Acampo'),
            ('AZ', 'S Avra Rd Tucson'),
            ('GA', 'Folkston'),
            ('FL', 'Biscayne Park'),
            ('FL', 'All Units Oxford'),
            ('IL', 'Volo'),
            ('MN', 'Peterson'),
            ('LA', 'Rayville'),
            ('KY', 'South Park View'),
            ('AZ', 'E Camino Emmanuel Sahuarita'),
            ('CA', 'Wrighwood'),
            ('CA', 'Los Angeles'),
            ('AZ', 'N Teal Blue Tr Tucson'),
            ('NY', 'Clifton Park'),
            ('IN', 'Frankl'),
            ('KY', 'Anchorage'),
            ('LA', 'Crown Point'),
            ('CA', 'Los BanosStruct CalcsSpan T'),
            ('IL', 'Chebanse');

And here the example of separate table for filters and the way to fill it with values:

    CREATE TABLE test_filter_values
(
    id    BIGSERIAL PRIMARY KEY,
    type  text,
    value jsonb
);

INSERT INTO test_filter_values (type, value)
VALUES ('CITY', (SELECT json_agg(DISTINCT city)::jsonb FROM test_data_table));

INSERT INTO test_filter_values (type, value)
VALUES ('STATE', (SELECT json_agg(DISTINCT state)::jsonb FROM test_data_table));

So, the main issue happens when we try to query data from this table, here is the example:

SELECT DISTINCT city from test_data_table
WHERE city ILIKE 'b%';

This query returns 4 results; And this one:

SELECT city FROM (SELECT jsonb_array_elements(value) AS city
                  FROM test_filter_values WHERE type = 'CITY') cities
WHERE city::text ILIKE 'b%';

Returns no results, LIKE operator without % at the begining or in the end returs zero results from test_filter_values table, but if using LIKE '%b%' with such case the behaviour is the same. The same issue happens with the exact match:

SELECT state FROM (SELECT jsonb_array_elements(value) AS state
                   FROM test_filter_values WHERE type = 'STATE') states
WHERE state::text = 'NC';

It returns no results either.

So how could we deal in this case?

2

There are 2 best solutions below

1
On BEST ANSWER

That's because jsonb_array_elements returns jsonb not text. A JSON array might have other arrays, objects, arrays of objects etc inside it.

That means the text values will be quoted as JSON and you will be trying to match against "Anchorage" not Anchorage when that gets converted to text.

Maybe just store an array of text if that's what you want to use?

3
On

If you're using jsonb, you get JSONPath expressions, and those offer like_regex filter: demo

SELECT jsonb_path_query( value
                        ,'$[*]?(@ like_regex "^b.*" flag "i")'
                       )#>>'{}' AS city
FROM test_filter_values 
WHERE type = 'CITY';
city
Belle Chasse
Biscayne Park
Bonita Springs
Bourbonnais

The #>>'{}' is there to "open up" the resulting jsonb strings and take the text-type value out. Plain ::text cast would produce double quotes around the values.

It's more flexible than comparing to every element from jsonb_array_elements_text() and you can apply other JSONPath-related logic in one go. If you prefer that for some other reason and all you do is this type of comparison, there's no need to nest a subquery:

SELECT state
FROM test_filter_values
CROSS JOIN jsonb_array_elements_text(value)as t(state)
WHERE type = 'STATE'
AND state = 'NC';

And in that case it should perform comparably.