I'm trying to calculate rolling hash values (buzzhash) for a big file using pipes.
Currently I have this. But don't know how to write a pipe that maintains a state.
import qualified Data.ByteString.Lazy as L
import Data.Word
import Data.Bits(xor,rotate)
import Data.Array
import Pipes
import Control.Monad.State.Strict
import Control.Monad(forever)
produceFromList (x:xs) = do
yield x
produceFromList xs
buzzHash = do
x <- await
h <- lift $ get -- pull out previous value
let h' = rotate h 1 `xor` (hashArrW8!x) -- calculate new value
lift $ put h' -- save new value
yield h'
stdoutLn :: Consumer Word64 IO ()
stdoutLn = do
a <- await
lift $ print a
main = do
bs <- L.unpack `fmap` L.getContents
runEffect $ produceFromList bs >-> buzzHash >-> stdoutLn
hashArrW8 :: Array Word8 Word64
How do I make buzzHash save previous value and use it for the calculation of next value? Initial state value should be 0.
You were almost there; you just need to run the state.
I assume you don't want to recover the state, so I use
execStateTrather thanrunStateT.The only curiosity here is that
stdoutLnwas marked asConsumer Word64 IO (). So I usehoist liftto make itConsumer Word64 (StateT Word64 IO) ()Everything in the seriesa >-> b >-> cmust agree in the underlying monad and return type.Here are a few further comments that might save you time. First
produceFromListiseach.Moreover, you could have avoided the
hoist liftby relabeling yourstdoutLn:But here there is some trouble: you are not repeating the action. This should pretty clearly be a loop:
in fact this is already available as
P.print, so we can writeIf I understand you,
buzzHashis meant to be repeated indefinitely too:(this is
forever buzzHash, where we use yourbuzzHash)Finally, if you
we see we don't need the lazy bytestring IO, which doesn't stream properly anyway.
Pipes.ByteStringalready has theunpackwe want, packaged as a lens, so that we useview PB.unpackwhere elsewhere we would useB.unpack. So in the end we can writeOnce it is in this form we see we aren't using the underlying state of the pipeline except in
buzzHash, so we can localize thisor, if you like you can rewrite
Then you would write