Using Regex to wrap xml element value with cdata

793 Views Asked by At

I have to edit a stored procedure that builds xml strings so that all the element values are wrapped in cdata. Some of the values have already been wrapped in cdata so I need to ignore those.

I figured this is a good attempt to learn some regex

From: <element>~DATA_04</element> 
to:   <element><![CDATA[~DATA_04]]></element>

What are my options on how to do this? I can do simple regex, this is way more advanced.

NOTE: The <element> is generic for illustration purposes, in reality, it could be anything and is unknown.

Sample text:

    declare @sql   nvarchar(max) =
'    <data>
    <header></header>
    <docInfo>Blah</docInfo>
    <someelement>~DATA_04</someelement>
    <anotherelement><![CDATA[~DATA_05]]></anotherelement>
</data>
'

Using the sample xml, the regex would need to find someelement and add cdata to it like <someelement><![CDATA[~DATA_04]]></someelement> and leave the other elements alone.

Bear in mind, I did not write this horrible sql code, i just have to edit it.

2

There are 2 best solutions below

2
On

Try with (<[^>]+>)(\~data_([^<]+))(<[^>]+>)

and replace for \1<![CDATA[\2]]>\4

this will give you: <element><![CDATA[~DATA_04]]></element>, where element could be anything else. Check the DEMO

Good luck

1
On

This is c#:

string text = Regex.Replace( inputString, @"<element>~(.+)</element>", "<element>![CDATA[~$1]]</element>" , RegexOptions.None );

The find is:

<element>~(.+)</element>

The replace is:

<element>![CDATA[~$1]]</element>

I'm assuming there is a ~ at the start of the inside of the element tag.

You will also want to watch out for whitespace if that is an issue...

You may want to add some

\s*

Any whitespace characters, zero or more matches