I've been trying to convert a unicode regex to POSIX regex to remove \p{So} , \p{Cs}, \p{Cn} and \x1A type of characters from a column.
In Informatica I was using reg_replace (col_name,'[\p{So}\p{Cs}\p{Cn}\x1A]',' ') function to filter out these kind of characters and replace them with a single space.
However, when migrating to RedShift the same regex is not working with REGEXP_REPLACE() as it does not support the Unicode blocks.
For e.g. this is the string :-
'INT¡®L n°1 di KONGRESSE Ä Ê Í EUROREGIONE. V.Ž?#'
From above string, I need to keep the Latin / French letters, alphanumeric & punctuations and just remove the other symbols.
You can use
It matches any single char other than letters from
ÀtoŽ, punctuation and alphanumerics. If[:alnum:]is Unicode-aware in your environment, you may removeÀ-Ž.It also appears that you still can use a PCRE regex with Amazon Redshift
REGEXP_REPLACEprovided you pass thepparameters option:The backslashes need doubling,
1param is the position in string to start matching from and thepoption tellsREGEXP_REPLACEto use PCRE regex syntax.