I am working with point clouds in haskell using the repa library (3 as well as 4). At least I am trying to.
There are a few operations I need to do massively where parallelism really helps a lot. Most of these are simple linear algebra operations on the (metric) neighborhood of a point. For example a principal component analysis where I need to compute a SVD on a small matrix where each row is a point.
Now I use the linear package for the vector type
type Vec3 = V3 Float
and a 1-dimensional array of these vectors for point clouds
type Cloud = Array F DIM1 Vec3
So now I have the problem of calling a function using matrix decompositions of hmatrix in parallel using computeP. I tried hmatrix itself as well as the repa-linear-algebra package for that. The problem I have is that with all these calls (no matter how I provide the data and no matter what I call (svd, eigen decompositions, qr decomp. etc.)) the application always crashes randomly with a bus error or segfault.
I also didn't find any way to get any stacktrace that would at least point me in the right direction. Stack traces usually end at pthread.
Additionally I wrote my own C code which I call like e.g.:
foreign import ccall safe "pca.hpp pca"
c_pca :: CUInt -> Ptr Float -> Ptr Float -> Ptr Float -> IO ()
{-# INLINE foreignPCA #-}
foreignPCA :: forall r . (Source r Vec3) => Array r DIM1 Vec3 -> ([Vec3], Vec3)
foreignPCA !vs = unsafePerformIO $ do
n <- return $ Arrays.length vs
ps <- mallocForeignPtrArray n :: IO (ForeignPtr Vec3) -- point matrix
computeIntoS ps (delay vs)
as <- mallocForeignPtrArray 3 :: IO (ForeignPtr Float) -- singular values
av <- mallocForeignPtrArray 3 :: IO (ForeignPtr Vec3) -- right singular vectors
withForeignPtr (castForeignPtr ps) $ \pps ->
withForeignPtr as $ \pas ->
withForeignPtr (castForeignPtr av) $ \pav -> do
c_pca (fromIntegral n) pps pas pav
svalues <- peekArray 3 pas :: IO [Float]
svecs <- peekArray 3 (castPtr pav :: Ptr Vec3) :: IO [Vec3]
let [sx, sy, sz] = svalues in
return (svecs, (V3 sx sy sz))
This works perfectly fine on a massive point cloud with 20 cores in parallel. Never crashed in any way.
Now my very vague idea is that hmatrix calls C/Fortran code with "safe" thereby allowing pthread forks and without actually being thread-safe. I can't try to verify this assumption as debugging seems a foreign concept to the haskell tool chain (at least for a complete newbie that I am).
In conclusion I have three questions:
- Is hmatrix known to have problems working in parallel
- Is there anyone working on native implementations of these fundamental algorithms?
- How do I prevent FFI wrapped code to spawn fork()'ed threads without having access to the import call?
- How do I debug things like hmatrix?
The second one is of particular interest to me since I find hmatrix to be incredibly ugly (subhask looks very promising but is too incomplete to be feasible). My goal is to change to haskell, but if I have to use my own C++ code for any trivial matter like above, I can just keep coding in C++ as I do now...