Existing list of all punctuation/whitespace chars in C#

79 Views Asked by At

I'm splitting a string on all punctuation and whitespace characters. Rather than build a complicated (?) regex to match what C# considers "punctuation" and "whitespace" characters, I'm using the char.IsPunctuation and char.IsWhiteSpace methods to get the characters from the string that are punctuation/whitespace.

Basically, this is what I'm doing - building an array of punctuation and whitespace characters, which I later use to split the string.

return text.Where(c => char.IsPunctuation(c) || char.IsWhiteSpace(c))
    .Distinct()
    .ToArray();

I did it this way originally because I couldn't find anywhere there was a static list/array of chars that C# considers punctuation or whitespace. In the MSDN documentation for char.IsPunctuation, it lists the Unicode code points it considers punctuation, but my question is: does that list exist anywhere in the .NET code? That I could reference instead of building it from the input string every time?

1

There are 1 best solutions below

0
Tim Schmelter On

Instead of using String.Split with an endless list of characters and determining them with LINQ before you split, which is really not efficient, you could use a different approach using a StringBuilder and enumerate the characters just once. For example:

public static string[] SplitWhiteSpacesAndPunctuations(string text, StringSplitOptions options = StringSplitOptions.None)
{
    List<string> list = new(Math.Max(text.Length/10, 4));
    StringBuilder sb = new(10);
    foreach (char c in text)
    {
        if (char.IsWhiteSpace(c) || char.IsPunctuation(c))
        {
            AddStringAndClearStringBuilder();
        }
        else
        {
            sb.Append(c);
        }
    }

    AddStringAndClearStringBuilder();

    return list.ToArray();

    void AddStringAndClearStringBuilder()
    {
        if (sb.Length == 0 && options == StringSplitOptions.RemoveEmptyEntries) return;
        list.Add(sb.ToString());
        sb = new StringBuilder();
    }
}

Demo: https://dotnetfiddle.net/3eLI9d