Multilevel list natural sort with regexp

370 Views Asked by At

Here is an example of data:

'1.'    'Numeric types'
'1.1.'  'Integer'
'1.2.'  'Float'
...
'1.10'  'Double'

To naturally sort it we can use string_to_array with '.' as separator, then cast text[] to int[] and sort by integer array, but since the field itself is of type text and there might be cases where user decides to use non-numeric symbols, e.g. 1.1.3a, thus causing cast error. To address that I decided to use regexp:

select regexp_matches('1.2.3.4.', E'(?:(\\d+)\.?)+')

Expected result is array: {'1', '2', '3', '4'} but instead i get only the last element of the said array, however, if I use following regexp:

select regexp_matches('1.2.3.4.', E'((?:\\d+)\.?)+')

The result is {'1.2.3.4.'}.

Using global-flag 'g' is not an option, because regexp_matches returns a column.

Is there any way to convert '1.2.3.4a.'::text to {1, 2, 3 ,4}::int[] using only one regexp_matches?

Fiddle.

1

There are 1 best solutions below

6
On

You can use the global 'g' flag with regexp_matches, but needs to aggregate values to an array (most simple with the array() constructor):

select array(select m[1] from regexp_matches(dt_code, '(\d+)', 'g') m)::int[] nums, *
from data_types
order by 1;

Or, you can split your string to array with string_to_array(), but you still need to use regexp to remove any non-numeric characters:

select string_to_array(trim(regexp_replace(dt_code, '[^\d\.]+', ''), '.'), '.')::int[] nums, *
from data_types
order by 1;

For a more advanced natural-like sorting, you need to split your text to tokens yourself. See more info at the related SO question.

I could come up with a simplified, reusable function:

create or replace function natural_order_tokens(text)
  returns table (
    txt text,
    num int,
    num_rep text
  )
  language sql
  strict
  immutable
as $func$
  select m[1], (case m[2] when '' then '0' else m[2] end)::int, m[2]
    from regexp_matches($1, '(\D*)(\d*)', 'g') m
   where m[1] != '' or m[2] != ''
$func$;

With this function, natural sorting will be this easy:

select *
from data_types
order by array(select t from natural_order_tokens(dt_code) t);

SQLFiddle