httpSink and parsing json

234 Views Asked by At

I think this is a bit advanced for me, but my goal would be to get the raw json from an http API, parse a first list from it, do whatever I need to do with that then move on to the next list, and so on. My hope being that this should allow for only one list at a time to be loaded in memory (each list is pretty small, but there are a LOT of lists in the json). I tried it with Aeson, and it ate up all the ram and processed endlessly for hours, I ended up having to kill it.

If I understand it correctly, httpSink should be the way to go, with maybe json-stream to do the actual parsing. I read the tutorial about conduits, but I'm clearly not understanding it properly since I can't make that work.

I know how to use parseByteString to decode a ByteString the way I need (at least my tests seem to work), but I can't figure out a way to use parseByteString as a Sink for httpSink's second parameter. Am I missing something obvious, or am I mistaken about the way conduit works ?

Thanks

2

There are 2 best solutions below

1
On BEST ANSWER

I haven't tested this, since I'm honestly not that familiar with the library, but I think this adapter function will make it work with conduit:

module Data.JsonStream.Parser.Conduit
  ( jsonConduit
  , JsonStreamException (..)
  ) where

import Data.Conduit
import Data.JsonStream.Parser
import Data.ByteString (ByteString)
import Control.Monad.Catch
import Data.Typeable

jsonConduit
  :: MonadThrow m
  => Parser a
  -> ConduitM ByteString a m ()
jsonConduit =
    go . runParser
  where
    go (ParseYield x p) = yield x >> go p
    go (ParseNeedData f) = await >>= maybe
      (throwM JsonStreamNotEnoughData)
      (go . f)
    go (ParseFailed str) = throwM $ JsonStreamException str
    go (ParseDone bs) = leftover bs

data JsonStreamException
  = JsonStreamException !String
  | JsonStreamNotEnoughData
  deriving (Show, Typeable)
instance Exception JsonStreamException
2
On

You wrote:

I read the tutorial about conduits, but I'm clearly not understanding it properly since I can't make that work.

 

I can't figure out a way to use parseByteString as a Sink for httpSink's second parameter.

The problem here is that Sink is just a short hand for a conduit:

type Sink i m r = ConduitM i Void m r

A Sink is a kind of conduit that has no downstream component.

Conduits are the solution you need, and I assume that this is the tutorial you read. If you are not comfortable with some of the concepts in it try asking a specific question about it.