Question moved here.
My requirement is to write a program that sort of mimics diff tools. Yes there are quite a few libraries and open source code that accomplishes this purpose, but I would like to write my own comparer.
Here's the starting point. I have a class called DataItem which looks like this:
public class DataItem
{
public DataItem() { }
public DataItem(string d, string v) { Data = d; Value = v; }
public string Data { get; set; }
public string Value { get; set; }
}
I have two lists of these class objects, let's call them PRE and POST and take some example values as follows. 'Data' part will be unique in a list.
preList: (Data,Value)
AAA,0
BBB,1
CCC,3
DDD,4
FFF,0
GGG,3
postList: (Data,Value)
AAA,0
BBB,2
DDD,4
EEE,9
FFF,3
Think of PRE as the original list, and POST as the list after some changes done. I would like to compare the two, and categorize them into three categories:
- Added Items - An item with a new 'Data' added to the list.
- Removed Items - An item was removed from the list.
- Diff Items - 'Data' is found in both PRE and POST lists, but their corresponding 'Value' is different.
So when categorized they should look like this:
Added Items:
EEE,9
Removed Items:
CCC,3
GGG,3
Diff Items:
BBB
FFF
I have another DiffItem class, to objects of which I would like to put the final results. DiffItem looks like this:
public class DiffItem
{
public DiffItem() { }
public DiffItem(string data, string type, string pre, string post) { Data = data; DiffType = type; PreVal = pre; PostVal = post; }
public string Data { get; set; }
public string DiffType { get; set; } // DiffType = Add/Remove/Diff
public string PreVal { get; set; } // preList value corresponding to Data item
public string PostVal { get; set; } // postList value corresponding to Data item
}
To accomplish this, first I extended IEqualityComparer and wrote a couple of comparers:
public class DataItemComparer : IEqualityComparer<DataItem>
{
public bool Equals(DataItem x, DataItem y)
{
return (string.Equals(x.Data, y.Data) && string.Equals(x.Value, y.Value));
}
public int GetHashCode(DataItem obj)
{
return obj.Data.GetHashCode();
}
}
public class DataItemDataComparer : IEqualityComparer<DataItem>
{
public bool Equals(DataItem x, DataItem y)
{
return string.Equals(x.Data, y.Data);
}
public int GetHashCode(DataItem obj)
{
return obj.Data.GetHashCode();
}
}
Then used Except() and Intersect() methods as follows:
static void DoDiff()
{
diffList = new List<DiffItem>();
IEnumerable<DataItem> preOnly = preList.Except(postList, new DataItemComparer());
IEnumerable<DataItem> postOnly = postList.Except(preList, new DataItemComparer());
IEnumerable<DataItem> common = postList.Intersect(preList, new DataItemComparer());
IEnumerable<DataItem> added = postOnly.Except(preOnly, new DataItemDataComparer());
IEnumerable<DataItem> removed = preOnly.Except(postOnly, new DataItemDataComparer());
IEnumerable<DataItem> diffPre = preOnly.Intersect(postOnly, new DataItemDataComparer());
IEnumerable<DataItem> diffPost = postOnly.Intersect(preOnly, new DataItemDataComparer());
foreach (DataItem add in added)
{
diffList.Add(new DiffItem(add.Data, "Add", null, add.Value));
}
foreach (DataItem rem in removed)
{
diffList.Add(new DiffItem(rem.Data, "Remove", rem.Value, null));
}
foreach (DataItem pre in diffPre)
{
DataItem post = diffPost.First(x => x.Data == pre.Data);
diffList.Add(new DiffItem(pre.Data, "Diff", pre.Value, post.Value));
}
}
This does work and gets the job done. But I'm wondering if there's a 'better' way to do this. Note that I put quotes around the word 'better', because I don't have a proper definition for what would make this better. Perhaps is there a way to get this done without as many 'foreach' loops and use of Except() and Intersetc(), since I would imagine that behind the Linq there's quite a bit of iterations going on.
Simply put, is there a cleaner code that I can write for this? I'm asking mostly out of academic interest and to expand my knowledge.
I don't think you need your IEqualityComparer: