Parallel calls to HMatrix (or FFI in general)

175 Views Asked by At

I am working with point clouds in haskell using the repa library (3 as well as 4). At least I am trying to.

There are a few operations I need to do massively where parallelism really helps a lot. Most of these are simple linear algebra operations on the (metric) neighborhood of a point. For example a principal component analysis where I need to compute a SVD on a small matrix where each row is a point.

Now I use the linear package for the vector type

type Vec3 = V3 Float

and a 1-dimensional array of these vectors for point clouds

type Cloud = Array F DIM1 Vec3

So now I have the problem of calling a function using matrix decompositions of hmatrix in parallel using computeP. I tried hmatrix itself as well as the repa-linear-algebra package for that. The problem I have is that with all these calls (no matter how I provide the data and no matter what I call (svd, eigen decompositions, qr decomp. etc.)) the application always crashes randomly with a bus error or segfault.

I also didn't find any way to get any stacktrace that would at least point me in the right direction. Stack traces usually end at pthread.

Additionally I wrote my own C code which I call like e.g.:

foreign import ccall safe "pca.hpp pca"
    c_pca :: CUInt -> Ptr Float -> Ptr Float -> Ptr Float -> IO ()

{-# INLINE foreignPCA #-}
foreignPCA :: forall r . (Source r Vec3) => Array r DIM1 Vec3 -> ([Vec3], Vec3)
foreignPCA !vs = unsafePerformIO $ do
    n <- return $ Arrays.length vs
    ps <- mallocForeignPtrArray n :: IO (ForeignPtr Vec3) -- point matrix
    computeIntoS ps (delay vs)
    as <- mallocForeignPtrArray 3 :: IO (ForeignPtr Float) -- singular values
    av <- mallocForeignPtrArray 3 :: IO (ForeignPtr Vec3) -- right singular vectors
    withForeignPtr (castForeignPtr ps) $ \pps ->
        withForeignPtr as $ \pas ->
            withForeignPtr (castForeignPtr av) $ \pav -> do
                c_pca (fromIntegral n) pps pas pav
                svalues <- peekArray 3 pas :: IO [Float]
                svecs <- peekArray 3 (castPtr pav :: Ptr Vec3) :: IO [Vec3]
                let [sx, sy, sz] = svalues in
                    return (svecs, (V3 sx sy sz))

This works perfectly fine on a massive point cloud with 20 cores in parallel. Never crashed in any way.

Now my very vague idea is that hmatrix calls C/Fortran code with "safe" thereby allowing pthread forks and without actually being thread-safe. I can't try to verify this assumption as debugging seems a foreign concept to the haskell tool chain (at least for a complete newbie that I am).

In conclusion I have three questions:

  • Is hmatrix known to have problems working in parallel
  • Is there anyone working on native implementations of these fundamental algorithms?
  • How do I prevent FFI wrapped code to spawn fork()'ed threads without having access to the import call?
  • How do I debug things like hmatrix?

The second one is of particular interest to me since I find hmatrix to be incredibly ugly (subhask looks very promising but is too incomplete to be feasible). My goal is to change to haskell, but if I have to use my own C++ code for any trivial matter like above, I can just keep coding in C++ as I do now...

0

There are 0 best solutions below