Distinct() How to find unique elements in list of objects

1.7k Views Asked by At

There is a very simple class:

public class LinkInformation
{
    public LinkInformation(string link, string text, string group)
    {
        this.Link = link;
        this.Text = text;
        this.Group = group;
    }

    public string Link { get; set; }
    public string Text { get; set; }
    public string Group { get; set; }

    public override string ToString()
    {
        return Link.PadRight(70) + Text.PadRight(40) + Group;
    }
}

And I create a list of objects of this class, containing multiple duplicates.

So, I tried using Distinct() to get a list of unique values.

But it does not work, so I implemented

IComparable<LinkInformation>

    int IComparable<LinkInformation>.CompareTo(LinkInformation other)
    {
        return this.ToString().CompareTo(other.ToString());
    }

and then...

IEqualityComparer<LinkInformation>

    public bool Equals(LinkInformation x, LinkInformation y)
    {
        return x.ToString().CompareTo(y.ToString()) == 0;
    }

    public int GetHashCode(LinkInformation obj)
    {
        int hash = 17;
        // Suitable nullity checks etc, of course :)
        hash = hash * 23 + obj.Link.GetHashCode();
        hash = hash * 23 + obj.Text.GetHashCode();
        hash = hash * 23 + obj.Group.GetHashCode();
        return hash;
    }

The code using the Distinct is:

    static void Main(string[] args)
    {
        string[] filePath = {   @"C:\temp\html\1.html",
                                @"C:\temp\html\2.html",
                                @"C:\temp\html\3.html",
                                @"C:\temp\html\4.html",
                                @"C:\temp\html\5.html"};

        int index = 0;

        foreach (var path in filePath)
        {
            var parser = new HtmlParser();

            var list = parser.Parse(path);

            var unique = list.Distinct();

            foreach (var elem in unique)
            {
                var full = new FileInfo(path).Name;
                var file = full.Substring(0, full.Length - 5);
                Console.WriteLine((++index).ToString().PadRight(5) + file.PadRight(20) + elem);
            }
        }

        Console.ReadKey();
    }

What has to be done to get Distinct() working?

4

There are 4 best solutions below

5
On BEST ANSWER

You need to actually pass the IEqualityComparer that you've created to Disctinct when you call it. It has two overloads, one accepting no parameters and one accepting an IEqualityComparer. If you don't provide a comparer the default is used, and the default comparer doesn't compare the objects as you want them to be compared.

5
On

Without using Distinct nor the comparer, how about:

list.GroupBy(x => x.ToString()).Select(x => x.First())

I know this solution is not the answer for the exact question, but I think is valid to be open for other solutions.

3
On

If you want to return distinct elements from sequences of objects of some custom data type, you have to implement the IEquatable generic interface in the class.

here is a sample implementation:

public class Product : IEquatable<Product>
{
    public string Name { get; set; }
    public int Code { get; set; }

    public bool Equals(Product other)
    {

        //Check whether the compared object is null.
        if (Object.ReferenceEquals(other, null)) return false;

        //Check whether the compared object references the same data.
        if (Object.ReferenceEquals(this, other)) return true;

        //Check whether the products' properties are equal.
        return Code.Equals(other.Code) && Name.Equals(other.Name);
    }

    // If Equals() returns true for a pair of objects 
    // then GetHashCode() must return the same value for these objects.

    public override int GetHashCode()
    {

        //Get hash code for the Name field if it is not null.
        int hashProductName = Name == null ? 0 : Name.GetHashCode();

        //Get hash code for the Code field.
        int hashProductCode = Code.GetHashCode();

        //Calculate the hash code for the product.
        return hashProductName ^ hashProductCode;
    }
}

And this is how you do the actual distinct:

Product[] products = { new Product { Name = "apple", Code = 9 }, 
                       new Product { Name = "orange", Code = 4 }, 
                       new Product { Name = "apple", Code = 9 }, 
                       new Product { Name = "lemon", Code = 12 } };

//Exclude duplicates.

IEnumerable<Product> noduplicates =
    products.Distinct();
6
On

If you are happy with defining the "distinctness" by a single property, you can do

list
    .GroupBy(x => x.Text)
    .Select(x => x.First())

to get a list of "unique" items.

No need to mess around with IEqualityComparer et al.