String StartsWith method returns true for byte order mark in any case

98 Views Asked by At

I copied blindly one of the stackoverflow answers, but it didn't work for me as I expected. I needed to remove a UTF8 byte order mark from a string, and somehow, the inputText.StartsWith(byteOrderMark) always returns true for whatever string:

internal class Program
{
    static void Main(string[] args)
    {
        var inputText = "hello";
        string byteOrderMark = Encoding.UTF8.GetString(Encoding.UTF8.GetPreamble());
        if (inputText.StartsWith(byteOrderMark))
            inputText = inputText.Remove(0, byteOrderMark.Length);
        Console.WriteLine(inputText); // ello
        Console.WriteLine(inputText[0] == byteOrderMark[0]); // false
    }
}

I can check it character by character, there is no problem. I'm interested why StartsWith returns true even when the string doesn't start with UTF8 preamble?

1

There are 1 best solutions below

0
dbc On

You need to use StringComparison.Ordinal:

if (inputText.StartsWith(byteOrderMark, StringComparison.Ordinal))
    inputText = inputText.Remove(0, byteOrderMark.Length);

As explained in the docs, String.StartsWith(String):

This method performs a word (case-sensitive and culture-sensitive) comparison using the current culture.

Since the BOM is non-printing, apparently in your culture your call is equivalent to inputText.StartsWith(string.Empty), which is always true.

The docs further note:

As explained in Best Practices for Using Strings, we recommend that you avoid calling string comparison methods that substitute default values and instead call methods that require parameters to be explicitly specified. To determine whether a string begins with a particular substring by using the string comparison rules of the current culture, signal your intention explicitly by calling the StartsWith(String, StringComparison) method overload with a value of CurrentCulture for its comparisonType parameter.

Seems like you may have gotten bitten by this.