Generate tuples of non-negative integers closed under promotion?

140 Views Asked by At

Let n be a positive integer, and let S be a set of n-tuples of non-negative integers. (For example, if n is 3, then S could be {(1,2,0), (1,0,1)}.)

Let's define an operation called "promote" that takes an n-tuple of non-negative integers (t1, t2, …, tn), an index i such that ti > 0, and an index j that's less than i, and returns the same n-tuple, except that the value at index j has been incremented and the value at index i has been decremented. For example, if we have the tuple (1,2,3,4,5), i = 5, and j = 2, then "promote" will return (1,3,3,4,4).

I want to compute the closure of "promote" over my set S, meaning the set of all tuples that I can reach by starting with an element from S and making zero or more calls to "promote" (with some choice of i and j for any given call). For example, if S is {(1,2,0), (1,0,1)}, then I want to compute the set {(2,0,0), (1,1,0), (1,0,1), (3,0,0), (2,1,0), (1,2,0)}. (For example, I can get (2,1,0) by starting with (1,2,0) and calling "promote" with i = 2 and j = 1, so (2,1,0).)

How can I do this efficiently?

Ideas

  1. One idea is to initialize my result set with S, apply as many promote moves starting from tuples in this set as I can, add these to the set, union these new tuples with my result set so far, and repeat. However, there's a lot of overlap with this approach?

  2. Another idea is instead of applying promote moves to all the tuples all at once, to only apply a promote move to the minimal tuple with respect to some ordering. But how could one parallelize that approach?

  3. I was particularly curious if there was a clean way to code this in a functional programming way. However, I'm ultimately concerned with just time efficiency.

EDIT: One could think of this as a digraph whose vertices are tuples in the closure with an edge between two tuples if there is a promote operation from one to the other. Then the first bullet above becomes "perform BFS starting from the tuples in the original set". (I added some tags to reflect this).

Side Remark: If anyone can think of a more descriptive title please change it or let me know so I can change it!

2

There are 2 best solutions below

0
ruakh On

If you picture the integers in these tuples as representing piles of objects, then you can think of promotion as "moving an object leftward" from one pile to another.

So, given two n-tuples t and u of nonnegative integers with the same sum, you can reach u from t by a sequence of promotions if and only if, for each i from 1 to n, the sum of the last i elements of u is less than or equal to the sum of the last i elements of t.

For example, suppose t is (1,2,0). Its suffix sums form the sequence [1+2+0,2+0,0], which is [3,2,0]. So any tuple reachable from (1,2,0) will have suffix sums that form a nonincreasing sequence whose first element is 3 and whose ith element is at most equal to the ith element of [3,2,0]. So in lexicographic order, the possible suffix sum sequences are [3,0,0], [3,1,0], and [3,2,0], meaning that the possible tuples are (3,0,0), (2,1,0), and (1,2,0). (Each element of the tuple, except the last, is the difference between successive elements of the suffix-sum sequence — and the last element of the tuple is the same as the last element of the suffix-sum sequence — so each suffix-sum sequence uniquely determines a tuple.)

In your example, the elements of S have distinct sums, so they're essentially unrelated; you can compute their closures independently without worrying about overlap.

In the more general case where some elements of S might have the same sum, you can break S up into subsets with equal sums, and handle each group independently. So in what follows, I'll assume that all elements of S have the same sum.

What you can do is:

  • Compute the suffix-sum sequence for each tuple in S, and store these in a list L. I'm going to assume that L is a purely-functional singly-linked list, since you mention that you're interested in doing this using functional programming.
  • Initialize an empty set of results.
  • Generate the suffix-sum sequence for each tuple in the closure of S over promotion, in lexicographic order, as follows:
    • The first element of the sequence is known/fixed.
    • For each subsequent element of the sequence:
      • Let this element be 0 and proceed recursively to the next element.
      • If the prior element was 0, return.
      • Then let this element be 1, and check that you're within the bounds set by the first suffix-sum sequence in L.
        • If so, proceed recursively to the next element.
        • If not, drop the first suffix-sum sequence in L. Repeatedly check whether the sequence you have so far is within the bounds set by the new first suffix-sum sequence in L, until either L is empty (in which case return) or you've found a suffix-sum sequence that's never less than the sequence you have so far (in which case proceed recursively to the next).
      • Repeatedly increment this element until we either exceed the prior element or exceed the bounds set by every element of L.
    • Whenever we've recursed all the way to the end of the sequence, compute the corresponding tuple and add it to our set of results.

If you're using functional programming for this, then you'll probably want to store the suffix-sum sequences that you're building in reverse order; for example, [3,0,…] can be represented as …::0::3::nil, so that you can efficient "append" by putting new elements at the head of the list. You can then fix that order at the same time as you compute the resulting tuples. Additionally, you'll likely want to build your set of results as a linked list as well, by passing it as a list in your recursive calls, and having your recursive calls return an augmented copy of it.

I don't have experience with parallelizing FP algorithms, but if you're comfortable doing that in general, then I think you'll have no difficulty doing it for the above. One good place to split into concurrent paths might be immediately after you've validated a given value of a given element of a suffix-sum sequence: one path can proceed recursively to the next element, while another path proceeds to the next-greater value of the same element. (But I think you'll need a different data structure for your results — maybe some sort of branching tree, that you then post-process into a list?)

0
Asad Saeeduddin On

TL;DR

You can improve on the naive solution using a divide and conquer approach: for each position i, find out all the ways to divvy up t_i into the i preceding fields. Then superimpose the results for each position.

Details follow.


Let's give this a go in Haskell.


Here's some preliminaries:

{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE BlockArguments #-}
{-# LANGUAGE AllowAmbiguousTypes #-}

import Data.Foldable qualified as Foldable
import Data.Set qualified as Set
import Data.Tree
import Data.Function ((&))

Just to get a sense for how it behaves, let's start by implementing this approach from your question:

One idea is to initialize my result set with S, apply as many promote moves starting from tuples in this set as I can, add these to the set, union these new tuples with my result set so far, and repeat. However, there's a lot of overlap with this approach?

First, a couple of utils for computing the closure of a multi-valued function and conditionally producing a list:


-- Utils
closure :: (a -> [a]) -> a -> Tree a
closure f = unfoldTree (\x -> (x, f x))

guardList :: Bool -> [x] -> [x]
guardList c xs = if c then xs else []

Now we can implement:


-- Solution A: closure
promote :: (Int, Int, Int) -> [(Int, Int, Int)]
promote (i, j, k) =
  mconcat
    [ guardList (k > 0) $ do
        [(i + 1, j, k - 1), (i, j + 1, k - 1)]
    , guardList (j > 0) $ do
        [(i + 1, j - 1, k)]
    ]

solutionA :: [(Int, Int, Int)] -> Forest (Int, Int, Int)
solutionA = fmap (closure promote)

Note that we're using lists instead of Sets so far, which will make more explicit all the redundant work we're doing. Note also that the list of promotions produced from a single step contains no duplicates by construction.

To test this, we can write some code to print out useful information:


testSolutionA :: [(Int, Int, Int)] -> IO ()
testSolutionA input = do
  let resultForest = solutionA input
  let resultList = foldMap Foldable.toList resultForest
  let resultSet = Set.fromList resultList
  putStrLn "== Duplicated results =================================="
  print $ length resultList - length resultSet
  putStrLn "== Final result set ===================================="
  print resultSet
  putStrLn "== Evaluation forest ==================================="
  putStrLn $ drawForest $ fmap (fmap show) $ resultForest

Here's what it looks like when evaluated on [(1,2,0), (1,0,1)]:

== Duplicated results ==================================
1
== Final result set ====================================
fromList [(1,0,1),(1,1,0),(1,2,0),(2,0,0),(2,1,0),(3,0,0)]
== Evaluation forest ===================================
(1,2,0)
|
`- (2,1,0)
   |
   `- (3,0,0)

(1,0,1)
|
+- (2,0,0)
|
`- (1,1,0)
   |
   `- (2,0,0)

You can observe some minimal duplication ((2, 0, 0)), although things aren't too bad.

However things blow up rapidly as we increase the size of the numbers and their "height". E.g. for testSolution0 [(1, 2, 3)], we get:

== Duplicated results ==================================
430
== Final result set ====================================
fromList [(1,2,3),(1,3,2),(1,4,1),(1,5,0),(2,1,3),(2,2,2),(2,3,1),(2,4,0),(3,0,3),(3,1,2),(3,2,1),(3,3,0),(4,0,2),(4,1,1),(4,2,0),(5,0,1),(5,1,0),(6,0,0)]
== Evaluation forest ===================================
(1,2,3)
|
+- (2,2,2)
|  |
|  +- (3,2,1)
|  |  |
|  |  +- (4,2,0)
|  |  |  |
|  |  |  `- (5,1,0)
|  |  |     |
|  |  |     `- (6,0,0)
|  |  |
|  |  +- (3,3,0)
|  |  |  |
|  |  |  `- (4,2,0)
|  |  |     |
|  |  |     `- (5,1,0)
|  |  |        |
|  |  |        `- (6,0,0)
|  |  |
|  |  `- (4,1,1)
|  |     |
|  |     +- (5,1,0)
|  |     |  |
|  |     |  `- (6,0,0)
|  |     |
|  |     +- (4,2,0)
|  |     |  |
|  |     |  `- (5,1,0)
|  |     |     |
...

I've truncated the tree, but you can already see from the multiple duplicates that this is quite wasteful.


We won't go any further with this approach, but for completeness, here is an inductive version that works for tuples of arbitrary length:


-- Solution B: inductive

class SolutionB t where
  -- | The number of fields in the tuple
  tsize :: Int

  -- | Increment a designated field in the tuple
  tinc ::
    -- | Index of field to increment: must be a natural number strictly smaller than `tsize`
    Int ->
    t ->
    t

  -- | The possible ways a given tuple can be promoted in a single step
  tpromote :: t -> [t]

instance SolutionB () where
  tsize = 0
  tinc x = error $ "BUG! Expected a natural number strictly smaller than 0 (impossible), given : " <> show x
  tpromote () = []

instance (i ~ Int, SolutionB t) => SolutionB (i, t) where
  tsize = 1 + tsize @t
  tinc i (v, r) =
    let l = 1 + tsize @t
     in if i > l
          then error $ "BUG! Expected a natural number strictly smaller than " <> show (tsize @t + 1) <> ", given: " <> show i
          else
            if i == l
              then (v + 1, r)
              else (v, tinc i r)
  tpromote (v, r) =
    mconcat
      [ guardList (v > 0) $ do
          take (tsize @t) [0 ..] & fmap \i -> (v - 1, tinc (tsize @t - i) r)
      , tpromote r & fmap (v,)
      ]

We'll need a couple of supplementary tuple-finagling operations:


-- Tuple finagling

convert3 :: (a, b, c) -> (c, (b, (a, ())))
convert3 (a, b, c) = (c, (b, (a, ())))

unconvert3 :: (c, (b, (a, ()))) -> (a, b, c)
unconvert3 (c, (b, (a, ()))) = (a, b, c)

Now we can recover an identical solution:

solutionB :: [(Int, Int, Int)] -> Forest (Int, Int, Int)
solutionB = fmap (fmap unconvert3 . closure tpromote . convert3)

Ok, so can we do better?

At a high level, what we're doing is rearranging the "mass" of each tuple in a way that preserves some invariants (mass is conserved, and the "potential energy" of the mass decreases).

Our existing "small step" solution calculates all the rearrangements that are legal in a single "step", then iterates the step to derive all reachable arrangements.

The major inefficiency is that this process is highly convergent, so a given arrangement X in the final result set might be re-derived repeatedly from several intermediate arrangements A, B, C, ...

An alternative "big step" approach is to figure out all the ways one could completely distribute the mass in each field of the tuple, then superimpose the results.

This amounts to finding, for each field i in the tuple, a weak i-composition of t_i, then appropriately superimposing the cartesian product of the results.


This time we'll go straight for the inductive solution:


-- Solution C: superposition of partitions

class SolutionC t where
  -- | Produce all ordered tuples that sum to the given natural number
  tdistrib :: Int -> [t]

  -- | Add tuples
  superimpose :: t -> t -> t

  -- | Find _all_ the legal rearrangements of a tuple (including itself)
  tpromote' :: t -> [t]

instance SolutionC () where
  tdistrib = \case
    -- An empty tuple sums to zero
    0 -> [()]
    -- There's no way to make it sum to anything else
    _ -> []

  -- Pretty self explanatory
  superimpose () () = ()

  -- There's only one arrangement of the empty tuple
  tpromote' () = [()]

instance (Int ~ i, SolutionC t) => SolutionC (i, t) where
  tdistrib n = do
    -- Reserve anywhere from `0` to `n` for the head
    i <- [0 .. n]
    -- Distribute the rest among the tail
    r <- tdistrib (n - i)
    pure (i, r)

  superimpose (i, r) (j, s) = (i + j, superimpose r s)

  tpromote' (i, r) = do
    -- Find all the arrangements for the tail
    r' <- tpromote' r
    -- Distribute the head over the entire tuple
    (i', r'') <- tdistrib i
    pure (i', superimpose r' r'')

solutionC :: [(Int, Int, Int)] -> [(Int, Int, Int)]
solutionC = foldMap $ fmap unconvert3 . tpromote' . convert3

Here's some code to help us test it:

testSolutionC :: [(Int, Int, Int)] -> IO ()
testSolutionC input = do
  let resultList = solutionC input
  let resultSet = Set.fromList resultList
  putStrLn "== Duplicated results =================================="
  print $ length resultList - length resultSet
  putStrLn "== Final result set ===================================="
  print resultSet

Here is the result for testSolutionC [(1, 2, 0), (1, 0, 1)]:

== Duplicated results ==================================
0
== Final result set ====================================
fromList [(1,0,1),(1,1,0),(1,2,0),(2,0,0),(2,1,0),(3,0,0)]

And here is the result for the stress test: testSolutionC [(1, 2, 3)]

== Duplicated results ==================================
12
== Final result set ====================================
fromList [(1,2,3),(1,3,2),(1,4,1),(1,5,0),(2,1,3),(2,2,2),(2,3,1),(2,4,0),(3,0,3),(3,1,2),(3,2,1),(3,3,0),(4,0,2),(4,1,1),(4,2,0),(5,0,1),(5,1,0),(6,0,0)]

Recall that the previous attempt had 430 duplicated results, so this is a significant improvement!


I will admit that the benchmarking here is not very scientific. You can proceed from here by:

  1. Properly modeling the computational expense involved in each approach.
  2. Measuring the performance of each approach in your actual application and seeing if either is good enough

Finally, if solution C is still too slow, you could optimize it by reusing the work done in computing subtuple partitions (instead of redoing all of it at each inductive step).


Here is the executable code listing:

{-# LANGUAGE AllowAmbiguousTypes #-}
{-# LANGUAGE BlockArguments #-}
{-# LANGUAGE LambdaCase #-}

import Data.Foldable qualified as Foldable
import Data.Function ((&))
import Data.Set qualified as Set
import Data.Tree

-- Utils

closure :: (a -> [a]) -> a -> Tree a
closure f = unfoldTree (\x -> (x, f x))

guardList :: Bool -> [x] -> [x]
guardList c xs = if c then xs else []

-- Tuple finagling

convert3 :: (a, b, c) -> (c, (b, (a, ())))
convert3 (a, b, c) = (c, (b, (a, ())))

unconvert3 :: (c, (b, (a, ()))) -> (a, b, c)
unconvert3 (c, (b, (a, ()))) = (a, b, c)

-- Solution A: closure

promote :: (Int, Int, Int) -> [(Int, Int, Int)]
promote (i, j, k) =
  mconcat
    [ guardList (k > 0) $ do
        [(i + 1, j, k - 1), (i, j + 1, k - 1)]
    , guardList (j > 0) $ do
        [(i + 1, j - 1, k)]
    ]

solutionA :: [(Int, Int, Int)] -> Forest (Int, Int, Int)
solutionA = fmap (closure promote)

testSolutionA :: [(Int, Int, Int)] -> IO ()
testSolutionA input = do
  let resultForest = solutionA input
  let resultList = foldMap Foldable.toList resultForest
  let resultSet = Set.fromList resultList
  putStrLn "== Duplicated results =================================="
  print $ length resultList - length resultSet
  putStrLn "== Final result set ===================================="
  print resultSet
  putStrLn "== Evaluation forest ==================================="
  putStrLn $ drawForest $ fmap (fmap show) $ resultForest

-- Solution B: inductive

class SolutionB t where
  -- | The number of fields in the tuple
  tsize :: Int

  -- | Increment a designated field in the tuple
  tinc ::
    -- | Index of field to increment: must be a natural number strictly smaller than `tsize`
    Int ->
    t ->
    t

  -- | The possible ways a given tuple can be promoted in a single step
  tpromote :: t -> [t]

instance SolutionB () where
  tsize = 0
  tinc x = error $ "BUG! Expected a natural number strictly smaller than 0 (impossible), given : " <> show x
  tpromote () = []

instance (i ~ Int, SolutionB t) => SolutionB (i, t) where
  tsize = 1 + tsize @t
  tinc i (v, r) =
    let l = 1 + tsize @t
     in if i > l
          then error $ "BUG! Expected a natural number strictly smaller than " <> show (tsize @t + 1) <> ", given: " <> show i
          else
            if i == l
              then (v + 1, r)
              else (v, tinc i r)
  tpromote (v, r) =
    mconcat
      [ guardList (v > 0) $ do
          take (tsize @t) [0 ..] & fmap \i -> (v - 1, tinc (tsize @t - i) r)
      , tpromote r & fmap (v,)
      ]

solutionB :: [(Int, Int, Int)] -> Forest (Int, Int, Int)
solutionB = fmap (fmap unconvert3 . closure tpromote . convert3)

testSolutionB :: [(Int, Int, Int)] -> IO ()
testSolutionB input = do
  let resultForest = solutionB input
  let resultList = foldMap Foldable.toList resultForest
  let resultSet = Set.fromList resultList
  putStrLn "== Duplicated results =================================="
  print $ length resultList - length resultSet
  putStrLn "== Final result set ===================================="
  print resultSet
  putStrLn "== Evaluation forest ==================================="
  putStrLn $ drawForest $ fmap (fmap show) $ resultForest

-- Solution C: superposition of partitions

class SolutionC t where
  -- | Produce all ordered tuples that sum to the given natural number
  tdistrib :: Int -> [t]

  -- | Add tuples
  superimpose :: t -> t -> t

  -- | Find _all_ the legal rearrangements of a tuple (including itself)
  tpromote' :: t -> [t]

instance SolutionC () where
  tdistrib = \case
    -- An empty tuple sums to zero
    0 -> [()]
    -- There's no way to make it sum to anything else
    _ -> []

  -- Pretty self explanatory
  superimpose () () = ()

  -- There's only one arrangement of the empty tuple
  tpromote' () = [()]

instance (Int ~ i, SolutionC t) => SolutionC (i, t) where
  tdistrib n = do
    -- Reserve anywhere from `0` to `n` for the head
    i <- [0 .. n]
    -- Distribute the rest among the tail
    r <- tdistrib (n - i)
    pure (i, r)

  superimpose (i, r) (j, s) = (i + j, superimpose r s)

  tpromote' (i, r) = do
    -- Find all the arrangements for the tail
    r' <- tpromote' r
    -- Distribute the head over the entire tuple
    (i', r'') <- tdistrib i
    pure (i', superimpose r' r'')

solutionC :: [(Int, Int, Int)] -> [(Int, Int, Int)]
solutionC = foldMap $ fmap unconvert3 . tpromote' . convert3

testSolutionC :: [(Int, Int, Int)] -> IO ()
testSolutionC input = do
  let resultList = solutionC input
  let resultSet = Set.fromList resultList
  putStrLn "== Duplicated results =================================="
  print $ length resultList - length resultSet
  putStrLn "== Final result set ===================================="
  print resultSet

main :: IO ()
main = do
  putStrLn "== Solution A - Simple Test ============================"
  testSolutionA [(1, 2, 0), (1, 0, 1)]
  putStrLn "== Solution A - Stress Test ============================"
  testSolutionA [(1, 2, 3)]

  putStrLn "== Solution B - Simple Test ============================"
  testSolutionB [(1, 2, 0), (1, 0, 1)]
  putStrLn "== Solution B - Stress Test ============================"
  testSolutionB [(1, 2, 3)]

  putStrLn "== Solution C - Simple Test ============================"
  testSolutionC [(1, 2, 0), (1, 0, 1)]
  putStrLn "== Solution C - Stress Test ============================"
  testSolutionC [(1, 2, 3)]