How to remove delimited sections from text in PostgreSQL?

999 Views Asked by At

I want to eliminate some text pattern from a string, my string has a pipe delimiter, and the parameters do not always follow each other.

This is my string

TType=SEND|Status=OK|URL=min://j?_a=3&ver=1.1|day=3

I want to eliminate TType=SEND and URL=min://j?_a=3&ver=1.1

Therefore my final result should be

Status=OK|day=3

What i have tried. Not working in postgresql .

select REGEXP_REPLACE('TType=SEND|Status=OK|URL=min://j?_a=3&ver=1.1|day=3', 
'(TType=.*?(\||$))|(URL=.*?(\||$))', '')
5

There are 5 best solutions below

0
On BEST ANSWER

Answer:

SELECT 
REGEXP_REPLACE(
 REGEXP_REPLACE('TType=SEND|Status=OK|URL=min://j?_a=3&ver=1.1|day=3',
  '(TType|URL)=[^|]*\|?', '','g'),
'\|$', '');

Explanation:

  1. The .*? part in your pattern, although not greedy, consumes colons as well, so doesn't behave as intended. This is fixed by [^|]* that consumes any non colon character, zero or more times.

  2. Then you would also need to add the global flag 'g', in order to replace all occurences of pattern, as described in the documentation.

  3. Finally, in case a parameter you need to eliminate occurs last (since the parameters can appear in any order), you need to add an extra replacement step to eliminate a residual colon at the end of the string.

For example without the extra step, the following

SELECT
REGEXP_REPLACE('Status=OK|URL=min://j?_a=3&ver=1.1|day=3|TType=SEND',
  '(TType|URL)=[^|]*\|?', '','g');

produces

Status=OK|day=3|

while, addding the extra step, the following

SELECT 
REGEXP_REPLACE(
 REGEXP_REPLACE('Status=OK|URL=min://j?_a=3&ver=1.1|day=3|TType=SEND',
  '(TType|URL)=[^|]*\|?', '','g'),
'\|$', '');

produces the desired

Status=OK|day=3
9
On

step-by-step demo:db<>fiddle

SELECT
    string_agg(elements,'|')                                                 -- 3
FROM mytable,
    regexp_split_to_table(mystring, '\|') as elements                        -- 1
WHERE split_part(elements, '=', 1) = ANY(ARRAY['TType', 'URL']) IS NOT TRUE  -- 2
  1. Split the string into params like A=B. Move every into a separate record
  2. Split these elements at the = character and filter for elements without key = TType or URL
  3. Finally aggregate all these first splits to a string list.
0
On

The S-Man's answer is a working one

Sure upvoted, solution is okay however it does not fully satisfy my question. since i would want the solution to be within select and from

If this is a "mandatory" requirement then I see the following options:

  1. create a function
  2. use LATERAL JOIN to enclose all the logic into one place, related PostgreSQL: using a calculated column in the same query

The final query may look like:

SELECT t.*, s.result
FROM t
LEFT JOIN LATERAL (
   SELECT string_agg(elements,'|') AS result
   FROM regexp_split_to_table(t.col, '\|') as elements
   WHERE split_part(elements, '=', 1) = ANY(ARRAY['TType', 'URL']) IS NOT TRUE) s ON TRUE

db<>fiddle demo

Alternatively by using subquery in SELECT list:

SELECT t.*, 
(
   SELECT string_agg(elements,'|') AS result
   FROM regexp_split_to_table(t.col, '\|') as elements
   WHERE split_part(elements, '=', 1) = ANY(ARRAY['TType', 'URL']) IS NOT TRUE
) AS result
FROM t

db<>fiddle demo 2

2
On

The following regex based solution should do the trick:

SELECT TRIM(REGEXP_REPLACE(
         'TType=SEND|Status=OK|URL=min://j?_a=3&ver=1.1|day=3', 
         '(TType|URL)=[^|]*(\||$)', '', 'g'), '|')
-- outputs:
-- Status=OK|day=3

How the pattern works:

(TType|URL)=[^|]*(\||$)
|-----------|----|-----
1           2    3
  1. the pattern starts consuming if any substring starts with either TType or URL followed by =
  2. the pattern consumes any character that is not |
  3. the pattern consumes either the | or the end of the string

The g flag is described in the documentation as

flag g specifies replacement of each matching substring rather than only the first one.

It is necessary here as we want to replace all substrings that match our pattern.

Finally, sometimes a single | character might remain at the end of the string. Any trailing | character is trimmed from the result using TRIM

2
On

There were a few issues with your attempted regular expression:

  1. Even though a non-greedy .*? match was used, this could still include pipe symbols. This can be rectified by using a matcher that allows anything except a pipe symbol (this can be greedy): [^|]*
  2. It should use the 'g' flag to replace all occurrences, not just the first.
  3. It only looks for the pipe at the end, not at the beginning. This means it will leave the last pipe intact at the end if it matches the string after the last pipe (i.e. URL=... in your example).

From addressing the points above, here is a working version:

SELECT REGEXP_REPLACE('TType=SEND|Status=OK|URL=min://j?_a=3&ver=1.1|day=3', '((Status|TType)=[^|]*[|]|[|](Status|TType)=[^|]*)', '', 'g')

Rextester demo: https://rextester.com/CYBP40923