How to increase perfomance for loop using c#

265 Views Asked by At

I compare task data from Microsoft project using a nested for loop. But since the project has many records (more than 1000), it is very slow.

How do I improve the performance?

for (int n = 1; n < thisProject.Tasks.Count; n++) 
{
    string abc = thisProject.Tasks[n].Name;
    string def = thisProject.Tasks[n].ResourceNames;
    for (int l = thisProject.Tasks.Count; l > n; l--) 
    {
        // MessageBox.Show(thisProject.Tasks[l].Name);
        if (abc == thisProject.Tasks[l].Name && def == thisProject.Tasks[l].ResourceNames) 
        {
            thisProject.Tasks[l].Delete();
        }
    }
}

As you notice, I am comparing the Name and ResourceNames on the individual Task and when I find a duplicate, I call Task.Delete to get rid of the duplicate

4

There are 4 best solutions below

1
On BEST ANSWER

A hash check should be lot faster in this case then nested-looping i.e. O(n) vs O(n^2)

First, provide a equality comparer of your own

class TaskComparer : IEqualityComparer<Task> {
    public bool Equals(Task x, Task y) {
        if (ReferenceEquals(x, y)) return true;
        if (ReferenceEquals(x, null)) return false;
        if (ReferenceEquals(y, null)) return false;
        if (x.GetType() != y.GetType()) return false;
        return string.Equals(x.Name, y.Name) && string.Equals(x.ResourceNames, y.ResourceNames);
    }

    public int GetHashCode(Task task) {
        unchecked {
            return 
                ((task?.Name?.GetHashCode()         ?? 0) * 397) ^ 
                 (task?.ResourceNames?.GetHashCode() ?? 0);
        }
    }
}

Don't worry too much about the GetHashCode function implementation; this is just a broiler-plate code which composes a unique hash-code from its properties

Now you have this class for comparison and hashing, you can use the below code to remove your dupes

var set = new HashSet<Task>(new TaskComparer());
for (int i = thisProject.Tasks.Count - 1; i >= 0; --i) {
    if (!set.Add(thisProject.Tasks[i]))
        thisProject.Tasks[i].Delete();
}

As you notice, you are simply scanning all your elements, while storing them into a HashSet. This HashSet will check, based on our equality comparer, if the provided element is a duplicate or not.

Now, since you want to delete it, the detected dupes are deleted. You can modify this code to simply extract the Unique items instead of deleting the dupes, by reversing the condition to if (set.Add(thisProject.Tasks[i])) and processing within this if

5
On

Microsoft Project has a Sort method which makes simple work of this problem. Sort the tasks by Name, Resource Names, and Unique ID and then loop through comparing adjacent tasks and delete duplicates. By using Unique ID as the third sort key you can be sure to delete the duplicate that was added later. Alternatively, you can use the task ID to remove tasks that are lower down in the schedule. Here's a VBA example of how to do this:

Sub RemoveDuplicateTasks()

    Dim proj As Project
    Set proj = ActiveProject

    Application.Sort Key1:="Name", Ascending1:=True, Key2:="Resource Names", Ascending2:=True, Key3:="Unique ID", Ascending3:=True, Renumber:=False, Outline:=False
    Application.SelectAll
    Dim tsks As Tasks
    Set tsks = Application.ActiveSelection.Tasks

    Dim i As Integer
    Do While i < tsks.Count
        If tsks(i).Name = tsks(i + 1).Name And tsks(i).ResourceNames = tsks(i + 1).ResourceNames Then
            tsks(i + 1).Delete
        Else
            i = i + 1
        End If
    Loop

    Application.Sort Key1:="ID", Renumber:=False, Outline:=False
    Application.SelectBeginning

End Sub

Note: This question relates to algorithm, not syntax; VBA is easy to translate to c#.

2
On

This should give you all the items which are duplicates, so you can delete them from your original list.

thisProject.Tasks.GroupBy(x => new { x.Name, x.ResourceNames}).Where(g => g.Count() > 1).SelectMany(g => g.Select(c => c));

Note that you probably do not want to remove all of them, only the duplicate versions, so be careful how you loop through this list.

0
On

A Linq way of getting distinct elements from your Tasks list :

public class Task
{
    public string Name {get;set;}
    public string ResourceName {get;set;}
}

public class Program
{
    public static void Main()
    {
        List<Task> Tasks = new List<Task>();
        Tasks.Add(new Task(){Name = "a",ResourceName = "ra"});
        Tasks.Add(new Task(){Name = "b",ResourceName = "rb"});
        Tasks.Add(new Task(){Name = "c",ResourceName = "rc"});
        Tasks.Add(new Task(){Name = "a",ResourceName = "ra"});
        Tasks.Add(new Task(){Name = "b",ResourceName = "rb"});
        Tasks.Add(new Task(){Name = "c",ResourceName = "rc"});

        Console.WriteLine("Initial List :");
        foreach(var t in Tasks){
            Console.WriteLine(t.Name);  
        }

        // Here comes the interesting part
        List<Task> Tasks2 = Tasks.GroupBy(x => new {x.Name, x.ResourceName})
                                 .Select(g => g.First()).ToList();

        Console.WriteLine("Final List :");
        foreach(Task t in Tasks2){
            Console.WriteLine(t.Name);  
        }
    }
}

This selects every first elements having the same Name and ResourceName.

Run the example here.