We have a two-dimensional array of Objects. Generally every item is a common value type like Int32 or Decimal, and one column of the array contains values of the same type. Our array can contain about a million rows, and we need to sort it.

To make it fast, we use a special algorithm based on so called row map, when every item of the row map array contains the index of the array row to display in the specific position. We do not swap array rows while sorting - we just sort the row map array using our custom comparer.

The main part of the sorting algorithm looks like this:

RowNavigatorMapComparer myRowNavigatorMapComparer = new RowNavigatorMapComparer(this, myGroupAndSortData, isGrouping);
RowNavigatorMapItem[] map = RowNavigatorMapItem.FromRowNavigatorArray(fRowsMap, rowIndex, rowCount);
Array.Sort(map, myRowNavigatorMapComparer);

RowNavigatorMapComparer implements IComparer<RowNavigatorMapItem>, and two array array values are compared inside its int Compare(RowNavigatorMapItem x, RowNavigatorMapItem y) implementation like this:

return ManagerCompare.CompareObjects(cellX.Value, cellY.Value);

, where ManagerCompare is implemented this way:

public class ManagerCompare
{
    private ManagerCompare(){}

    public static int CompareObjects(object valueX, object valueY)
    {
        IComparable myValueX = valueX as IComparable;
        IComparable myValueY = valueY as IComparable;
        if(myValueX == null)
        {
            if(myValueY == null)
                return 0;
            return -1;
        }
        if(myValueY == null)
            return 1;

        Type myTypeX = myValueX.GetType();
        Type myTypeY = myValueY.GetType();
        if(myTypeX != myTypeY)
            return string.CompareOrdinal(myTypeX.Name, myTypeY.Name); 

        return  myValueX.CompareTo(myValueY);
    }
}

We do not like the performance of the whole construction described above and would like to speed up it dramatically. We know that calling Array.Sort(map, myRowNavigatorMapComparer) takes 98% of time for the test random contents of the array. And if we return -1 instead of ManagerCompare.CompareObjects(cellX.Value, cellY.Value) just to estimate the speed if all items have been already sorted, the total time reduces up to ten times. So, it seems, the main problem in the Array.Sort() implementation, i.e. how it moves the data in the array. Is there any way to replace it with something more effective?

Another point to consider could be using hard-coded comparison algorithms for every column of our target array if we know the type of the values stored in the column. For instance, if a column contains integer values, we could use the following call instead of ManagerCompare.CompareObjects:

myResult = ((int)cellX.Value).CompareTo((int)cellY.Value);

But the best performance gain in this case is just 5%...

Sure I understand that our code uses unboxing because of storing value types as objects, but we can't avoid that. We need to provide the user with an array of objects that is populated dynamically from different sources.

The original version of this code was written in the era of .NET 1.x, but now we can rewrite it using more powerful tools from .NET 2.0 or higher, so any advices are welcome.

1

There are 1 best solutions below

4
On

The original version of this code was written in the era of .NET 1.x, but now we can rewrite it using >more powerful tools from .NET 2.0 or higher, so any advices are welcome.

One thing to keep in mind here (although it might not impact your particular scenario) is that in .NET 4.5 the implementation for Array.Sort has changed.

Coming back to your problem now. You say that you can't avoid storing the values as objects but that shouldn't prevent you from doing a bit or preprocessing. You could try the following:

  • before sorting the array, loop through it once and cache all the information that you need. For instance you could create a new type which stores an IComparable reference + the results of GetType. Because you are using a comparison based sorting algorithm (at best O(n logn) ) this means that you only pay these costs once (which will at least remove some casts, method calls and comparisons in your compare method).
  • the above should continue to increase your performance based on the lifetime of your array