How to perform stable sort over multiple columns?

571 Views Asked by At

Imagine i have a data set that contains:

Date            Id
--------------  ----
11/1/2017       null
11/4/2017       3
11/5/2017       null
11/12/2017      10
null            1
null            2
null            7
null            8
null            9

I want the rows ordered so that both columns are increasing.

Using a naïve ORDER BY Date, ID does not do it:

enter image description here

There is an ordering

There is an ordering that satisfies the results of my desired sort order:

  • the date column is always increasing
  • the id column value is always increasing

 

enter image description here

Or course, that's not a unique ordering:

Date            Id
--------------  ---------------
null            1
11/1/2017       null
null            2
11/4/2017       3
null            7
null            8
null            9
11/5/2017       null
11/12/2017      10

A programming language could do it

I know i can accomplish this on the client side. In a functional functional programming language: use a stable sorting algorithm:

A stable sort is one which preserves the original order of the input set, where the comparison algorithm does not distinguish between two or more items.

Consider a sorting algorithm that sorts cards by rank, but not by suit. The stable sort will guarantee that the original order of cards having the same rank is preserved; the unstable sort will not.

enter image description here

Unfortunately i have

  • 9.1 million rows
  • 1.8 GB

of monotonically increasing rows to put in best possible chronological sort order. Obviously i'd prefer to do this on the server - which is well suited to handling large amounts of data.

How can i perform a stable-sort in SQL Server?


Example Data

 

CREATE TABLE #SortDemo (Date datetime NULL, Id int NULL)

INSERT INTO #SortDemo (Date, Id)
VALUES 
    ('20171101', null),
    ('20171104',    3),
    ('20171105', null),
    ('20171112',   10),
    (null,          1),
    (null,          2),
    (null,          7),
    (null,          8),
    (null,          9)


SELECT * FROM #SortDemo
ORDER BY Date, Id
1

There are 1 best solutions below

0
On BEST ANSWER

It is a solvable problem. You don't have to throw your hands up and say computers cannot be used to solve problems.

Start with the data, and add a new surrogate "Rank" column.

Date            Id    Rank
--------------  ----  ----
null            7     null
null            1     null
null            9     null
null            2     null  
null            8     null
11/1/2017       null  null
11/4/2017       3     null
11/5/2017       null  null
11/12/2017      10    null
11/13/2017      null  null

Then seed Rank to the Id:

UPDATE SortDemo SET Rank = Id
WHERE Id IS NOT NULL

Date            Id    Rank
--------------  ----  ----
null            7     7
null            1     1 
null            9     9 
null            2     2 
null            8     8 
11/1/2017       null  null
11/4/2017       3     3
11/5/2017       null  null 
11/12/2017      10    10
11/13/2017      null  null

Then for the remaining items items with a Date, we need to assign it to the "next" rank:

Date            Id    Rank
--------------  ----  ----
null            7     7
null            1     1 
null            9     9 
null            2     2 
null            8     8 
11/1/2017       null  null  <-- 3
11/4/2017       3     3     
11/5/2017       null  null  <-- 10
11/12/2017      10    10    
11/13/2017      null  null  <-- ?

with

UPDATE SortDemo SET Rank = (
      SELECT MIN(Rank) FROM SortDemo s2 
      WHERE s2.Date >= SortDemo.Date)
WHERE SortDemo.Date IS NOT NULL

Date            Id    Rank
--------------  ----  ----
null            7     7
null            1     1 
null            9     9 
null            2     2 
null            8     8 
11/1/2017       null  3    <--
11/4/2017       3     3     
11/5/2017       null  10   <--
11/12/2017      10    10    
11/13/2017      null  null <-- ?

There's also the edge case where there are no items "after" us; which we can fix by going backwards to one the highest previous rank:

UPDATE SortDemo SET Rank = (
      SELECT MAX(Rank) FROM SortDemo s2 
      WHERE s2.Date <= SortDemo.Date)
WHERE SortDemo.Date IS NOT NULL

Date            Id    Rank
--------------  ----  ----
null            7     7
null            1     1 
null            9     9 
null            2     2 
null            8     8 
11/1/2017       null  3
11/4/2017       3     3     
11/5/2017       null  10
11/12/2017      10    10    
11/13/2017      null  10 <--10

And we're done

We can now sort by surrogate Rank, and break ties by sorting by Date:

SELECT * FROM SortDemo
ORDER BY Rank, Date

Date            Id    Rank
--------------  ----  ----
null            1     1 
null            2     2 
11/1/2017       null  3
11/4/2017       3     3     
null            7     7
null            8     8 
null            9     9 
11/5/2017       null  10
11/12/2017      10    10    
11/13/2017      null  10

Solution complete. Sql Fiddle

The answer is in escrow until Monday, so i can give people the opportunity to earn reputation for solving a unique problem.