Numeric Range Optimization

301 Views Asked by At

I have an set numeric ranges that I would like to optimize.

Here's a simple example of initial values:

Start    End
9        12
1        2
60       88
10       11
79       80

What I'd expect as output after optimization:

Start    End
1        2
9        12
60       88

These are the left and right values from Modified Preorder Tree Traversal (Nested Set) data stored in a MySQL database. I use them to exclude inactive branches from the result, and am not currently optimizing the ranges at all. I thought I might get a performance gain from optimizing the ranges before use.


MORE INFO

The values are passed into a query for exclusion of the inactive branches in the tree using a NOT BETWEEN clause. I thought that I could optimize the performance of that query by using a minimal set of ranges.

4

There are 4 best solutions below

3
On BEST ANSWER

Here is an SQL that will return what you want

mysql> CREATE TABLE sample (Start INT, End INT);

mysql> INSERT sample VALUES (9,12),(1,2),(60,88),(10,11),(79,80);

mysql> SELECT * 
    -> FROM sample s 
    -> WHERE NOT EXISTS (SELECT 1 
    ->                   FROM sample 
    ->                   WHERE s.Start > Start AND s.Start < End);
+-------+------+
| Start | End  |
+-------+------+
|     9 |   12 |
|     1 |    2 |
|    60 |   88 |
+-------+------+

You can, of course, create VIEW, move the data to another table or delete rows using the above SQL.

NOTE: I am not really sure why are you doing this 'optimization'.

EDIT:
The query can be rewritten as

SELECT s.* 
FROM sample s LEFT JOIN 
     sample s2 ON s.Start > s2.Start AND s.Start < s2.End 
WHERE s2.start IS NULL;

Which will create different execution plan (2xsimple select vs primary/dependent subquery for EXISTS), so performance might be different. Both queries will use an index on (Start, End) if it exists.

2
On

Put them in a sorted list. Mark which elements in the sorted list represent range starts and which are range ends. Sort the list based on value first; however, make sure that range starts come before range ends. (This will probably involve a structure of some sort that can be sorted on a given key. I don't know the details in php.)

Now, traverse the list from start to end. Keep a counter, c. When you pass a range start, increment c. When you pass a range end, decrement c.

When c goes from 0 to 1, that's the start of a range in the final set. When c goes from 1 to 0, that's the end of a range.

EDIT:: If you already have the ranges in a database table somewhere, you can probably structure an SQL query to do the first step above (again, making sure that range start-points are returned before range end-points).

0
On

Here's a simple implementation:

// I picked this format because it's convenient for the solution
// and because it's very natural for a human to read/write
$ranges = array(
  9    =>    12,
  1    =>    2,
  60   =>    81,
  10   =>    11,
  79   =>    88);

ksort($ranges);
$count = count($ranges);
$prev = null; // holds the previous start-end pair

foreach($ranges as $start => $end) {
    // If this range overlaps or is adjacent to the previous one
    if ($prev !== null && $start <= $prev[1] + 1) {
        // Update the previous one (both in $prev and in $ranges)
        // to the union of its previous value and the current range
        $ranges[$prev[0]] = $prev[1] = max($end, $prev[1]);

        // Mark the current range as "deleted"
        $ranges[$start] = null;
        continue;
    }

    $prev = array($start, $end);
}

// Filter all "deleted" ranges out
$ranges = array_filter($ranges);

Limitations/notes:

  1. The range boundaries have to be small enough to fit into an int.
  2. This example will incorrectly remove any range from the final result if the ending boundary is 0. If your data can legitimately contain such a range, provide an appropriate callback to array_filter: function($item) { return $item === null; }.

See it in action.

0
On
$ranges = array(
  array(9, 12),
  array(1, 2),
  array(60, 81),
  array(10, 11),
  array(79, 88),
  );

function optimizeRangeArray($r) {
  $flagarr = array();
  foreach ($r as $range => $bounds) {
    $flagarr = array_pad($flagarr, $bounds[1], false);
    for ($i = $bounds[0]-1; $i < $bounds[1]; $i++) $flagarr[$i] = true;
    }
  $res = array(); $min = 0; $max = 0; $laststate = false;
  $ctr = 0;
  foreach ($flagarr as $state) {
    if ($state != $laststate) {
      if ($state) $min = $ctr + 1;
      else {
        $max = $ctr;
        $res[] = array($min, $max);
        }
      $laststate = $state;
      }
    $ctr++;
    }
  $max = $ctr;
  $res[] = array($min, $max);
  return($res);
  }

print_r(optimizeRangeArray($ranges));

Output:

Array
(
    [0] => Array
        (
            [0] => 1
            [1] => 2
        )

    [1] => Array
        (
            [0] => 9
            [1] => 12
        )

    [2] => Array
        (
            [0] => 60
            [1] => 88
        )

)

Note: This doesn't work for negative integers!

Or use it like this

$rres = optimizeRangeArray($ranges);

$out = "<pre>Start    End<br />";
foreach($rres as $range=>$bounds) {
  $out .= str_pad($bounds[0], 9, ' ') . str_pad($bounds[1], 9, ' ') . "<br />";
  }
$out .= "</pre>";
echo $out;

To get this in your browser

Start    End
1        2
9        12
60       88