I'm trying to calculate rolling hash values (buzzhash) for a big file using pipes
.
Currently I have this. But don't know how to write a pipe that maintains a state.
import qualified Data.ByteString.Lazy as L
import Data.Word
import Data.Bits(xor,rotate)
import Data.Array
import Pipes
import Control.Monad.State.Strict
import Control.Monad(forever)
produceFromList (x:xs) = do
yield x
produceFromList xs
buzzHash = do
x <- await
h <- lift $ get -- pull out previous value
let h' = rotate h 1 `xor` (hashArrW8!x) -- calculate new value
lift $ put h' -- save new value
yield h'
stdoutLn :: Consumer Word64 IO ()
stdoutLn = do
a <- await
lift $ print a
main = do
bs <- L.unpack `fmap` L.getContents
runEffect $ produceFromList bs >-> buzzHash >-> stdoutLn
hashArrW8 :: Array Word8 Word64
How do I make buzzHash save previous value and use it for the calculation of next value? Initial state value should be 0.
You were almost there; you just need to run the state.
I assume you don't want to recover the state, so I use
execStateT
rather thanrunStateT
.The only curiosity here is that
stdoutLn
was marked asConsumer Word64 IO ()
. So I usehoist lift
to make itConsumer Word64 (StateT Word64 IO) ()
Everything in the seriesa >-> b >-> c
must agree in the underlying monad and return type.Here are a few further comments that might save you time. First
produceFromList
iseach
.Moreover, you could have avoided the
hoist lift
by relabeling yourstdoutLn
:But here there is some trouble: you are not repeating the action. This should pretty clearly be a loop:
in fact this is already available as
P.print
, so we can writeIf I understand you,
buzzHash
is meant to be repeated indefinitely too:(this is
forever buzzHash
, where we use yourbuzzHash
)Finally, if you
we see we don't need the lazy bytestring IO, which doesn't stream properly anyway.
Pipes.ByteString
already has theunpack
we want, packaged as a lens, so that we useview PB.unpack
where elsewhere we would useB.unpack
. So in the end we can writeOnce it is in this form we see we aren't using the underlying state of the pipeline except in
buzzHash
, so we can localize thisor, if you like you can rewrite
Then you would write