I've got a table with entries that are similar to "something XXX010101 somethingelse XXX010102".
I'd like to extract the XXX01... parts from them - possibly using some separator if needed - right now I'm able to easily take out the first one using:
select distinct substring(content from 'XXX[0-9]+'), content from data where content ~ 'XXX[0-9]+'
The one idea I have is to create some monster regex to replace everything that's not an XXX substring... but to be true I hoped (and failed to find) for a simpler solution (like a g flag to substring).
When looking around I found that 8.3 introduced a regex_matches function that seems to be what I'd need - are there any alternatives in 8.2? Or an easy way to get it in 8.2?
How would you go about solving this? Or is an upgrade my best bet? ;) (it's a production system so the downtime and some risk of problems after migration is a source of hesitation ).
Thanks in advance.
-- expected output added --
for
"something XXX010101 something else XXX010102"
I'd like to get:
XXX010101
XXX010102
OR (less preferable)
XXX010101,XXX010102
I am somewhat hesitant to even post my answer. You really must upgrade. Version 8.2 reaches end of live right now. Follow the link @a_horse_with_no_name posted.
However, the problem got my attention. The following demo should work with PostgreSQL 8.2:
Result:
Some explanation:
regex_matches()
in version 8.2 as OP already statedregexp_replace()
which can use ag
flag (replaceg
lobally),
, after removing all other occurances of,
Could be any character not part of the wanted strings, but,
can serve as separator in the result.(XXX\\d+?,)*?
to capture any number of leading occurances of wanted string.final
rtrim()
removes trailing,
In PsotgreSQL 8.3+ you could use
regexp_split_to_table()
to split the wanted strings into single rows. In 8.2 you'll have to come up with something of your own. I would write a plgpsql function ...This makes heavy use of a features of PostgreSQL's POSIX Regular Expressions (link to version 8.2!)