REGEXP_SUBSTR with pipe delimiter giving unexpected results

55 Views Asked by At

For this data in a packed field: ONE|TWO|THREE|FOUR|FIVE|SIX|SEVEN|EIGHT|NINE|TEN|ELEVENTHEBAND|TWELVE

I have this regexp: REGEXP_SUBSTR(the_packed_field_column , '((|)([[:alpha:] ]*)(|))' , 1 , 1) <--gives the expected result of TWO

but this regexp gives a result of |FOUR| REGEXP_SUBSTR(the_packed_field_column , '((|)([[:alpha:] ]*)(|))' , 1 , 2)

What SQL will work to split out specific values with no surprises?

UPDATE - with a great assist from blhsing and Gary_W I have this solved. Again, the challenge is to parse specific values out of a packed field with pipe delimiter:

ONE|TWO|THREE|FOUR|FIVE|SIX|SEVEN|EIGHT|NINE|TEN|ELEVEN_THE_BAND|TWELVE

but can also be: ONE|TWO||||SIX||EIGHT||TEN|ELEVEN_THE_BAND|TWELVE

might even be: ONE|TWO||||||||||TWELVE

For anyone who needs the REGEXP to parse a value out of a packed field with pipe delimiter, this is the basic pattern for all but the last:

'([[:alpha:]_]*)(|)' <--- in English this is: zero or more alpha characters plus also a _, then a literal pipe (wow - annoying - StackOverflow took out the backslash that makes it a literal pipe)

giving you this REGEXP_SUBSTR for the first eleven REGEXP_SUBSTR(the_column , '([[:alpha:]_]*)(|)' , 1 , n)

might also be REGEXP_SUBSTR(the_column , '([[:alpha:]_[:digit:]]*)(|)' , 1 , n)

the last one, TWELVE in this example, gets a slightly different pattern: '(backslash|)([[:alpha:]_]*)' , 1 , 11) <--- typed "backslash" because StackOverflow took out the \ that I typed (what??)

huge credit to anybody who knows how to exclude the pipe delimiters in the regexp, I had to do a REPLACE '|', ''

2

There are 2 best solutions below

2
blhsing On

Your regex includes both a leading pipe and a trailing pipe, so each match consumes two pipes, the latter of which should belong to the next token.

For the purpose of selecting the n-th word from a given string, matching a sequence of alphabets should be enough:

REGEXP_SUBSTR(the_packed_field_column, '[[:alpha:]]+' , 1 , 3)

which gives you:

THREE

Demo: https://www.mycompiler.io/view/JmZrhY2ugP8

0
Gary_W On

If you are using Oracle, use this form of REGEXP_SUBSTR() which handles NULL elements. For more information, see this post

REGEXP_SUBSTR(the_packed_field_column, '(.*?)(\||$)', 1 , 3, NULL, 1)

The regex returns the 3rd instance of an optional set of characters followed by a literal pipe or the end of the line. Then, returns the first group of that 3rd set. If a previous element is NULL, you will still get the 3rd element.