How can I escape all escape-worthy characters in one line of code?

7.9k Views Asked by At

Based on what I see here (accepted answer), it would seem that I could escape strings by doing this:

string s = "Woolworth's";
string t = Regex.Escape(s);
MessageBox.Show(t);

...but stepping through that, I see no difference between s and t (I hoped I'd see "Woolworth\'s" as the value of t instead of "Woolworth's" for both vars).

I could, I guess, do something like this:

    string s = "Woolworth's";
    s = s.Replace("'", "\'");

...etc., also escaping the following: [, ^, $, ., |, ?, *, +, (, ), and \

...but a "one stop shopping" solution would be preferable.

To be more specific, I need a string entered by a user to be something that is acceptable as a string value in an Android arrays.xml file.

For example, it chokes on this:

<item>Woolworth's</item>

...which needs to be this:

<item>Woolworth\'s</item>
4

There are 4 best solutions below

0
On BEST ANSWER

Regex.Escape() only escapes regex reserved characters:

Escapes a minimal set of characters (\, *, +, ?, |, {, [, (,), ^, $,., #, and white space) by replacing them with their escape codes. This instructs the regular expression engine to interpret these characters literally rather than as metacharacters.


Match/Capture a character class of characters you want to escape (note, some characters have special meanings in character classes and need to be escaped like \ and -):

(['^$.|?*+()\\])

And then replace it with a backslash and a reference to the character you want to escape:

\\1

Demo


In C#:

string s = "Woolworth's";
Regex rgx = new Regex("(['^$.|?*+()\\\\])");

string t = rgx.Replace(s, "\\$1");
// Woolworth\'s

Demo

0
On

There are different kinds of character escaping. In the question you linked to, they're talking about escaping for Regular Expressions, which have their own set of special characters.

If you're specifically looking to escape text for XML, you might want to check out the XmlConvert Class in the System.Xml namespace. With it, you can escape characters using XmlConvert.EncodeName and retrieve characters using XmlConvert.DecodeName:

    string s = "Woolworth's";
    string encoded = XmlConvert.EncodeName(s); // Value here is Woolworth_x0027_s
    string decoded = XmlConvert.DecodeName(encoded); // Value here is Woolworth's
0
On

Regex.Escape is not suitable for this context.

It is designed strictly for regular expressions and will escape both too much and too little for this context - trying to shoe-horn it into the model will likely break other values. (It doesn't escape ' or " because those characters have no special meaning in a .NET regular expression.)

The thing of relevance here is Item Element in a String Resource File does some special parsing of the text (related to the formatting) after it is read from XML:

If you have an apostrophe or a quote in your string, you must either escape it or enclose the whole string in the other type of enclosing quotes.

As such, a transformation appropriate in this context is simply

s.Replace("'", "\'").Replace("\"", "\\\"")

or

Regex.Replace(s, "['\"]", "\\$&")

(And then, assuming the XML is being properly built via a DOM or LINQ to XML, the XML encoding is taken care of elsewhere - although the rules are more complicated when using formatting vs mixed content styling.)

0
On

The best way to achieve something "in one line of code" is to write a method somewhere which gets the job done properly and then from that moment on, each time you invoke that method, think of yourself as doing it "in one line of code".

The accepted answer might seem to do the trick, but it will miserably fail with control characters such as line-feeds, or with any other unicode characters which might for various reasons be unprintable.

The following method will do the equivalent of StringEscapeUtils.escapeForJava().

I am primarily posting it for the sake of people stumbling upon this question in the future, looking for an answer to this very common problem.

public static String escapeForJava( String value, boolean quote )
{
    StringBuilder builder = new StringBuilder();
    if( quote )
        builder.append( "\"" );
    for( char c : value.toCharArray() )
    {
        if( c == '\'' )
            builder.append( "\\'" );
        else if ( c == '\"' )
            builder.append( "\\\"" );
        else if( c == '\r' )
            builder.append( "\\r" );
        else if( c == '\n' )
            builder.append( "\\n" );
        else if( c == '\t' )
            builder.append( "\\t" );
        else if( c < 32 || c >= 127 )
            builder.append( String.format( "\\u%04x", (int)c ) );
        else
            builder.append( c );
    }
    if( quote )
        builder.append( "\"" );
    return builder.toString();
}