Comparing two lists of class objects similar to a Diff Tool

278 Views Asked by At

Question moved here.

My requirement is to write a program that sort of mimics diff tools. Yes there are quite a few libraries and open source code that accomplishes this purpose, but I would like to write my own comparer.

Here's the starting point. I have a class called DataItem which looks like this:

public class DataItem
{
    public DataItem() { }
    public DataItem(string d, string v) { Data = d; Value = v; }

    public string Data { get; set; }
    public string Value { get; set; }
}

I have two lists of these class objects, let's call them PRE and POST and take some example values as follows. 'Data' part will be unique in a list.

preList: (Data,Value)
AAA,0
BBB,1
CCC,3
DDD,4
FFF,0
GGG,3

postList: (Data,Value)
AAA,0
BBB,2
DDD,4
EEE,9
FFF,3

Think of PRE as the original list, and POST as the list after some changes done. I would like to compare the two, and categorize them into three categories:

  1. Added Items - An item with a new 'Data' added to the list.
  2. Removed Items - An item was removed from the list.
  3. Diff Items - 'Data' is found in both PRE and POST lists, but their corresponding 'Value' is different.

So when categorized they should look like this:

Added Items:
EEE,9

Removed Items:
CCC,3
GGG,3

Diff Items:
BBB
FFF

I have another DiffItem class, to objects of which I would like to put the final results. DiffItem looks like this:

public class DiffItem
{
    public DiffItem() { }
    public DiffItem(string data, string type, string pre, string post) { Data = data; DiffType = type; PreVal = pre; PostVal = post; }

    public string Data { get; set; }
    public string DiffType { get; set; } // DiffType = Add/Remove/Diff
    public string PreVal { get; set; } // preList value corresponding to Data item
    public string PostVal { get; set; } // postList value corresponding to Data item
}

To accomplish this, first I extended IEqualityComparer and wrote a couple of comparers:

public class DataItemComparer : IEqualityComparer<DataItem>
{
    public bool Equals(DataItem x, DataItem y)
    {
        return (string.Equals(x.Data, y.Data) && string.Equals(x.Value, y.Value));
    }

    public int GetHashCode(DataItem obj)
    {
        return obj.Data.GetHashCode();
    }
}

public class DataItemDataComparer : IEqualityComparer<DataItem>
{
    public bool Equals(DataItem x, DataItem y)
    {
        return string.Equals(x.Data, y.Data);
    }

    public int GetHashCode(DataItem obj)
    {
        return obj.Data.GetHashCode();
    }
}

Then used Except() and Intersect() methods as follows:

    static void DoDiff()
    {
        diffList = new List<DiffItem>();

        IEnumerable<DataItem> preOnly = preList.Except(postList, new DataItemComparer());
        IEnumerable<DataItem> postOnly = postList.Except(preList, new DataItemComparer());
        IEnumerable<DataItem> common = postList.Intersect(preList, new DataItemComparer());

        IEnumerable<DataItem> added = postOnly.Except(preOnly, new DataItemDataComparer());
        IEnumerable<DataItem> removed = preOnly.Except(postOnly, new DataItemDataComparer());
        IEnumerable<DataItem> diffPre = preOnly.Intersect(postOnly, new DataItemDataComparer());
        IEnumerable<DataItem> diffPost = postOnly.Intersect(preOnly, new DataItemDataComparer());

        foreach (DataItem add in added)
        {
            diffList.Add(new DiffItem(add.Data, "Add", null, add.Value));
        }
        foreach (DataItem rem in removed)
        {
            diffList.Add(new DiffItem(rem.Data, "Remove", rem.Value, null));
        }
        foreach (DataItem pre in diffPre)
        {
            DataItem post = diffPost.First(x => x.Data == pre.Data);
            diffList.Add(new DiffItem(pre.Data, "Diff", pre.Value, post.Value));
        }
    }

This does work and gets the job done. But I'm wondering if there's a 'better' way to do this. Note that I put quotes around the word 'better', because I don't have a proper definition for what would make this better. Perhaps is there a way to get this done without as many 'foreach' loops and use of Except() and Intersetc(), since I would imagine that behind the Linq there's quite a bit of iterations going on.

Simply put, is there a cleaner code that I can write for this? I'm asking mostly out of academic interest and to expand my knowledge.

1

There are 1 best solutions below

0
On

I don't think you need your IEqualityComparer:

var added = from a in postList
            where !preList.Any(b => b.Data == a.Data)
            select new DiffItem(a.Data, "Add", null, a.Value);
var removed = from b in preList
              where !postList.Any(a => a.Data == b.Data)
              select new DiffItem(b.Data, "Remove", b.Value, null);
var diff = from b in preList
           join a in postList on b.Data equals a.Data
           where b.Value != a.Value
           select new DiffItem(b.Data, "Diff", b.Data, a.Data);
var diffList = added.ToList();
diffList.AddRange(removed);
diffList.AddRange(diff);