I am pretty noob in regexp concepts, I have the below scenario where few words are inside the square brackets and it might be there in multiple times, I need to retain only one of those words and should remove the square brackets.
Existing format:
| TXT |
| -------------------------------------------------------------------------------------- |
| This sentence has [num] [num] [num] [num] and there are [num] [num] in previous string |
I have this above table with txt column, and i need to apply an regexp regular expression function to achieve the below output.
Desired Output:
| TXT |
| ------------------------------------------------------------------- |
| This sentence has num and there are num in previous string |
Could you please help me on this, since this needs to be done on an sql query.. don't want any answers using UDFs
Thanks in Advance.
I have tried the below query and could get the first num to be retained but couldn't achieve what i am expecting
Query:
select
regexp_replace(regexp_replace(regexp_replace(txt,'\\[\\w+\\]','REGEX_WORD',1,1,'c'),'\\[\\w+\\]',''),'REGEX_WORD',regexp_replace(regexp_substr(txt,'\\[\\w+\\]'),'\\[|\\]','')) working_model from cte;
Output:
This sentence has num and there are in previous string
Code provides more flexibility than regular expressions, so there are some possibilities here regular expressions alone may not cover.
You can put your pattern in parenthesis (These things
()for those who call them "brackets") and then tell it the pattern can be repeated 1 or more times{1,}. Remember to double your backslashes\\in Snowflake if you use single quotes'to terminate your strings. If you use$$to terminate your strings you do not need to double your backslashes. You can use a back-reference to the first capture group using \1 as the replacement. Unfortunately, this means it keeps the surrounding square brackets:[]You can of course remove all square brackets, but this approach will remove them anywhere, not only on the repeated words:
One other problem this approach has. It's not mentioned in the question, but what if the repeated word in square brackets changes?
What should happen then, or is this not really a problem? These kind of capture groups are only going to detect a word inside square braces, not that they're changing. In this case the regexp_replace as shown above will return the first capture group, which will return
[num].