C# compare lists in a resource efficient way

167 Views Asked by At

this code works if string in list2 matches string in list1 and it is time efficient when it comes to larger lists

var firstNotSecond = list1.Except(list2).ToList();

however I'm trying to tweak it so that it would work if string in list2 contains a string in list1. So say if:

--list2 strings:

"john,sally,michael"
"tim,sally,andrew"
"stuart,bill,tom"

--list1 strings:

"sally"
"joe"

--final list would contain only last string from list2 as first two .Contains

"stuart,bill,tom"

I'm looking for the most time/resources efficient way to do this - the straight forward implementation with essentially two nested loops (with or without LINQ) is O(len(list1) * len(list2) ) - ideally I'd like same O(max(len(list1), len(list2))) as with .Except, but anything better than essentially n^2 is welcome.

4

There are 4 best solutions below

0
On BEST ANSWER

Here's a two liner, along the lines of what Optional Option was thinking:

var hash = new HashSet<string>(list1);
var reducedList2 = list2.Where(s => s.Split(',').All(e => !hash.Contains(e))).ToList();
  • Put sally and joe in the lookup

  • Enumerate the list of csv strings, splitting them and rejecting them if any element in the csv is in the lookup (implemented here as "accept them if all elements are not in the lookup")

Few unavoidable(unless you want to change your data storage) nasties though, I think. If you don't truly need a List out at the end and could work with an enumerable, remove the ToList to avoid an expensive copy operation

It could be a lot better by being a linked list of node that have string arrays instead of a list of strings that are csv- no splitting and efficient removal of nodes. Internally there will be a lot of list resizing and data shuffling with your current storage solution

0
On

The process essentially has to be O(nm) as each item has to be compared to the others. There is a way to build a trie and traverse if you really need to be efficient for extremely larger string sets.

Without building a trie, you can use a more simpler dictionary for lookups.

This will cut down a little on the process in a simpler approach by using O(1) to lookup matches, making this more of O(list2.length*list1.length) instead of having to compare each value. This will only improve performance for situations where the lists are longer and the O(1) lookup helps.

var list2Dict= list2.Select(l => l.ToDictionary( k => k, v => 0) ).ToList();
var result = list2.Where((_,i) => !list1.Any(l => list2Dict[i].ContainsKey(l))).ToList();

See it here: https://dotnetfiddle.net/19gOCZ

0
On

You can find the items in list1 Where All of the items in list2 are not contained by the list1 item using System.Linq extension methods:

var final = list1.Where(l1 => list2.All(l2 => !l1.Contains(l2))).ToList();

If you want to do a case-insensitive comparison, you can use the IndexOf method, along with StringComparison.OrdinalIgnoreCase (which returns -1 if the item is not found):

var final = list1.Where(l1 => list2.All(l2 =>
    l1.IndexOf(l2, StringComparison.OrdinalIgnoreCase) < 0))
    .ToList();
0
On

You can split l2 and convert it to a HashSets, so the .Contains will have complexity O(1), and the whole algorithm will just have O(n)

public static List<string> ListExcept(this List<string> l1, List<string> l2)
{
    IEnumerable<HashSet<string>> listOfHashes = l1.Select(x => x.Split(',').ToHashSet());
    IEnumerable<HashSet<string>> finalLists = listOfHashes.Where(hash => l2.All(x => !hash.Contains(x)));
    IEnumerable<string> joined = finalLists.Select(x => x.Aggregate("", (s, s1) => s + s1 + ",")[..^1]);

    return joined.ToList();
}

Obviously you can join the calls, I split them and specified the types explicitly to make it clearer.

And then you can just call it as:

l2.ListExcept(l1);