I have a simple C routine that takes four words and returns four words, and for which gcc can optimize and emit some primops that GHC doesn't support. I'm trying to benchmark various ways of calling this procedure, and am having trouble trying to adapt the technique described here to use foreign import prim
.
The following is meant to just add 1 to each input word, but segfaults.
Main.hs:
{-# LANGUAGE GHCForeignImportPrim #-}
{-# LANGUAGE ForeignFunctionInterface #-}
{-# LANGUAGE MagicHash #-}
{-# LANGUAGE UnboxedTuples #-}
{-# LANGUAGE UnliftedFFITypes #-}
import Foreign.C
import GHC.Prim
import GHC.Int
import GHC.Word
foreign import prim "sipRound"
sipRound_c# :: Word# -> Word# -> Word# -> Word# -> (# Word#, Word#, Word#, Word# #)
sipRound_c :: Word64 -> Word64 -> Word64 -> Word64 -> (Word64, Word64, Word64, Word64)
sipRound_c (W64# v0) (W64# v1) (W64# v2) (W64# v3) = case sipRound_c# v0 v1 v2 v3 of
(# v0', v1', v2', v3' #) -> (W64# v0', W64# v1', W64# v2', W64# v3')
main = do
print $ sipRound_c 1 2 3 4
sip.c:
#include <stdlib.h>
#include <stdint.h>
#include <stdbool.h>
// define a function pointer type that matches the STG calling convention
typedef void (*HsCall)(int64_t*, int64_t*, int64_t*, int64_t, int64_t, int64_t, int64_t,
int64_t, int64_t, int64_t*, float, float, float, float, double, double);
extern void
sipRound(
int64_t* restrict baseReg,
int64_t* restrict sp,
int64_t* restrict hp,
uint64_t v0, // R1
uint64_t v1, // R2
uint64_t v2, // R3
uint64_t v3, // R4
int64_t r5,
int64_t r6,
int64_t* restrict spLim,
float f1,
float f2,
float f3,
float f4,
double d1,
double d2)
{
v0 += 1;
v1 += 1;
v2 += 1;
v3 += 1;
// create undefined variables, clang will emit these as a llvm undef literal
const int64_t iUndef;
const float fUndef;
const double dUndef;
const HsCall fun = (HsCall)sp[0];
return fun(
baseReg,
sp,
hp,
v0,
v1,
v2,
v3,
iUndef,
iUndef,
spLim,
fUndef,
fUndef,
fUndef,
fUndef,
dUndef,
dUndef);
}
I don't really know what I'm doing. Is there a way to adapt the technique from that blog post? And is this a bad idea?
If you're willing to hand-write assembly you can do it like this (for x86_64). Put this in a file with a
.s
extension and provide it as an argument on the ghc command line.The mapping between STG registers and machine registers is defined in https://github.com/ghc/ghc/blob/master/includes/stg/MachRegs.h#L159.
Note that there will still be a function call involved, so it won't be as efficient as the code you are getting from LLVM.