Why isn't sum == foldl1 (+)?

398 Views Asked by At

I sum a list of matrices with foldl1 (+) because I noticed that sum actually returns the sum of the top left elements as a 1x1 matrix.

$ stack exec --resolver lts-12.5 --package matrix -- ghci
GHCi, version 8.4.3: http://www.haskell.org/ghc/  :? for help
Prelude> import Data.Matrix
Prelude Data.Matrix> t = [identity 2, identity 2]  -- two 2x2 identity matrices
Prelude Data.Matrix> foldl1 (+) t
┌     ┐
│ 2 0 │
│ 0 2 │
└     ┘
Prelude Data.Matrix> sum t
┌   ┐
│ 2 │
└   ┘

This is unexpected for me, LYAH suggests sum = foldl1 (+)¹ and even hlint suggests I use sum instead of foldl1 (+) as is 'removes error on []' (side question: How do I elegantly and []-safely sum up matrices?)

Why is sum /= foldl1 (+) for Matrices, and why isn't generally always sum == foldl1 (+)?

Or, to exclude the case of the empty list:

Why isn't sum == foldl neutralElement (+)? or specifically sum == foldl (+) (zero 2 2) on [identity 2, identity 2]?

Own work:

Prelude Data.Matrix> :t sum
sum :: (Foldable t, Num a) => t a -> a
Prelude Data.Matrix> :info sum
class Foldable (t :: * -> *) where
  ...
  sum :: Num a => t a -> a
  ...
        -- Defined in ‘Data.Foldable’

Source of sum is:

sum :: Num a => t a -> a
sum = getSum #. foldMap Sum

Instantiations of Matrix are:

instance Foldable Matrix where
foldMap f = foldMap f . mvect

instance Num a => Num (Matrix a) where
 fromInteger = M 1 1 . V.singleton . fromInteger
 negate = fmap negate
 abs = fmap abs
 signum = fmap signum

 -- Addition of matrices.
 {-# SPECIALIZE (+) :: Matrix Double -> Matrix Double -> Matrix Double #-}
 {-# SPECIALIZE (+) :: Matrix Int -> Matrix Int -> Matrix Int #-}
 (M n m v) + (M n' m' v')
   -- Checking that sizes match...
   | n /= n' || m /= m' = error $ "Addition of " ++ sizeStr n m ++ " and "
                               ++ sizeStr n' m' ++ " matrices."
   -- Otherwise, trivial zip.
   | otherwise = M n m $ V.zipWith (+) v v'

 -- Substraction of matrices.
 ...
 -- Multiplication of matrices.
 ...

¹ in the context of adding integers

3

There are 3 best solutions below

0
On BEST ANSWER

Because the Data.Matrix instance for Num implements fromInteger by returning a 1x1 matrix, and + by adding elementwise which truncates the right matrix to the size of the left one (or errors if the left is larger than the right).

The sum is foldl (+) 0 which starts with a 1x1 matrix from the 0 and truncates all of the matrices in the list to that size. The foldl1 (+) does not have to make a matrix from an integer, so the result is the size of the smallest matrix in the list.

Also, sum = getSum #. foldMap Sum is the default Foldable implementation, but it is overridden by the list instance to be sum = foldl (+) 0.

The Data.Matrix implementation of + (as you show above) up to version 0.3.1.1 generates an error if the operands are not the same shape, but in version 0.3.2.0 it assumes they are the same shape without checking, and in version 0.3.3.0 it changed again (according to the comment):

-- | Perform an operation element-wise.
--   The second matrix must have at least as many rows
--   and columns as the first matrix. If it's bigger,
--   the leftover items will be ignored.
--   If it's smaller, it will cause a run-time error.
--   You may want to use 'elementwiseUnsafe' if you
--   are definitely sure that a run-time error won't
--   arise.

The implementation appears to be the same up to the current version 0.3.6.1

0
On

I think this question has been answered perfectly fine by @pat, but I will try to answer the side question:

How do I elegantly and []-safely sum up matrices?

You can't safely sum up a list of matrices, because you don't know for sure what is the size of each of the matrices in the list is, unless you have that size information at the type level. Imagine a type data Matrix (m :: Nat) (n :: Nat) e = ..., then it would be guaranteed that each of the matrices in a list have the exactly same dimensions. I am not sure if there is any work in that area for the general purpose array library, but I've seen this approach in OpenCV bindings and in mapalgebra and briefly considered using it in massiv. This approach comes with a few disadvantages, but that is not important to this question.

Another approach is to not to use a list of matrices, but instead use a 3D array, which is essentially a sequence of pages of 2D matrices of the same size. Obviously, matrix package does not have support for higher dimensions, so you can't achieve it using that library. I'll show how it can be done with massiv though. Consider we have this 3D array a:

λ> a = makeArrayR D Par (Sz3 3 4 5) $ \(i :> j :. k) -> i + j * k
λ> a
Array D Par (Sz (3 :> 4 :. 5))
  [ [ [ 0, 0, 0, 0, 0 ]
    , [ 0, 1, 2, 3, 4 ]
    , [ 0, 2, 4, 6, 8 ]
    , [ 0, 3, 6, 9, 12 ]
    ]
  , [ [ 1, 1, 1, 1, 1 ]
    , [ 1, 2, 3, 4, 5 ]
    , [ 1, 3, 5, 7, 9 ]
    , [ 1, 4, 7, 10, 13 ]
    ]
  , [ [ 2, 2, 2, 2, 2 ]
    , [ 2, 3, 4, 5, 6 ]
    , [ 2, 4, 6, 8, 10 ]
    , [ 2, 5, 8, 11, 14 ]
    ]
  ]

we can then use foldlWithin to fold across any dimension of an array, while reducing it's dimensionality:

λ> foldlWithin Dim3 (+) 0 a
Array D Par (Sz (4 :. 5))
  [ [ 3, 3, 3, 3, 3 ]
  , [ 3, 6, 9, 12, 15 ]
  , [ 3, 9, 15, 21, 27 ]
  , [ 3, 12, 21, 30, 39 ]
  ]
λ> foldlWithin Dim1 (+) 0 a
Array D Par (Sz (3 :. 4))
  [ [ 0, 10, 20, 30 ]
  , [ 5, 15, 25, 35 ]
  , [ 10, 20, 30, 40 ]
  ]

Note that there is no chance of an exception to be raised or some other unpredictable behavior like you've experienced at any point here (putting async exceptions aside, of course). Alternatively it is possible to take slices of the array, and add those individually, but that approach is a bit more involved if we want to avoid exceptions:

λ> a !?> 0
Array D Par (Sz (4 :. 5))
  [ [ 0, 0, 0, 0, 0 ]
  , [ 0, 1, 2, 3, 4 ]
  , [ 0, 2, 4, 6, 8 ]
  , [ 0, 3, 6, 9, 12 ]
  ]
λ> import Data.Maybe
λ> fromMaybe empty $ a !?> 0 >>= \acc0 -> foldlM (\acc pageIx -> (acc +) <$> (a !?> pageIx)) acc0 (1 ... 2)
Array D Par (Sz (4 :. 5))
  [ [ 3, 3, 3, 3, 3 ]
  , [ 3, 6, 9, 12, 15 ]
  , [ 3, 9, 15, 21, 27 ]
  , [ 3, 12, 21, 30, 39 ]
  ]

Above will produce Nothing if a doesn't have a page with index 0 through 2. On the other hand, if you are ok with runtime exceptions, then this will be a bit clearer, but not as safe:

λ> foldlS (\acc pageIx -> acc + (a !> pageIx)) (a !> 0) (1 ... 2)
Array D Par (Sz (4 :. 5))
  [ [ 3, 3, 3, 3, 3 ]
  , [ 3, 6, 9, 12, 15 ]
  , [ 3, 9, 15, 21, 27 ]
  , [ 3, 12, 21, 30, 39 ]
  ]

Or using lists, as it was stated in the actual question:

λ> Prelude.foldl1 (+) $ Prelude.fmap (a !>) [0 .. 2]
Array D Par (Sz (4 :. 5))
  [ [ 3, 3, 3, 3, 3 ]
  , [ 3, 6, 9, 12, 15 ]
  , [ 3, 9, 15, 21, 27 ]
  , [ 3, 12, 21, 30, 39 ]
  ]

A side note. I noticed the usage of the SECIALIZE pragma, which leads me to believe that you actually care about performance. A small, but important fact on that: matrix package uses boxed vectors underneath for Matrix datatype, which will always give you terrible performance and no pragma can help you with that.

1
On

Well to start:

why isn't generally always sum == foldl1 (+)?

foldl1 (+) []
*** Exception: Prelude.foldl1: empty list

sum []
0

That's the main difference, sum allows to the pattern matching with [] and foldl1 (+) doesn't.

foldl1 was meant to use with nonempty lists, because there are functions that has no sense with empty lists , example:

foldl1 max

what is the max of []? there is no max, because there is no elements.

foldl1 max [5,4,2,3]
5