Determine if HashSet<String> contains a different cased string

2k Views Asked by At

I have a large set of strings, including many duplicates. It is important that all of the duplicates have the same casing. So this set would fail the test:

String[] strings = new String[] { "a", "A", "b", "C", "b" };

....but this test would pass:

String[] strings = new String[] { "A", "A", "b", "C", "b" };

As I iterate through each string in strings, how can my program see that A is a case-insensitive duplicate of a (and thus fail), but allow the duplicate b through?

3

There are 3 best solutions below

0
On

And another option using LINQ.

                    //Group strings without considering case
bool doesListPass = strings.GroupBy(s => s.ToUpper())
                    //Check that all strings in each group has the same case
                    .All(group => group.All(s => group.First() == s));

                    //Group strings without considering case
IEnumerable<string> cleanedList = strings.GroupBy(s => s.ToUpper())
                    //Check that all strings in each group has the same case
                    .Where(group => group.All(s => group.First() == s))
                    //Map all the "passing" groups to a list of strings 
                    .SelectMany(g => g.ToList());

Note: You can use ToUpper() or ToUpperInvariant() depending on your need.

4
On

One simple approach would be to create two sets - one using a case-insensitive string comparer, and one using a case-sensitive one. (It's not clear to me whether you want a culture-sensitive string or not, or in which culture.)

After construction, if the two sets has a different size (Count) then there must be some elements which are equal by case-insensitive comparison, but not equal by case-sensitive comparison.

So something like:

public static bool AllDuplicatesSameCase(IEnumerable<string> input)
{
    var sensitive = new HashSet<String>(input, StringComparer.InvariantCulture);
    var insensitive = new HashSet<String>(input, 
          StringComparer.InvariantCultureIgnoreCase);
    return sensitive.Count == insensitive.Count;
}
0
On

You could check each entry explicitly.

static bool DuplicatesHaveSameCasing(string[] strings)
{
  for (int i = 0; i < strings.Length; ++i)
  {
    for (int j = i + 1; j < strings.Length; ++j)
    {
      if (string.Equals(strings[i], strings[j], StringComparison.OrdinalIgnoreCase)
        && strings[i] != strings[j])
      {
        return false;
      }
    }
  }
  return true;
}

Comment: I chose to use ordinal comparison. Note that != operator uses an ordinal and case-sensitive comparison. It is rather trivial to change this into some culture-dependent comparison.