I'm trying to traverse/sequence a large stream (e.g. scala.collection.immutable.Stream
) using Scalaz' (version 7.1.2) Traverse
typeclass, but I'm constantly running into a java.lang.OutOfMemoryError: GC overhead limit exceeded
issue.
My traversal looks roughly like the following:
import scalaz._
import Scalaz._
import scala.collection.immutable.Stream
Stream.range(1, 1000000000).traverse[MTrans, Int](f)
where MTrans
is a monad transformer stack including EitherT
and StateT
and f: Int => MTrans[Int]
.
I'm generally just interested to sequence the elements (passing on state) and only need the final result (MTrans[Int]
), not the whole (materialized) sequence/stream.
I have versions running with traverseKTrampoline
but that doesn't help since this is not a StackOverflowError
issue as described in other similar posts. I also tried combinations of using EphemeralStream
and sequence
but had no success.
How can I (memory-)efficiently traverse/sequence such a stream?
Update 1
The following is a more complete example of what I'm trying to do. It closely resembles the structure that I have and exhibits the same problem (GC overhead limit exceeds at some point).
object Main {
import scalaz._
import Scalaz._
import Kleisli._
import scala.collection.immutable.Stream
import scala.language.higherKinds
case class IState(s: Int)
type IStateT[A] = StateT[Id, IState, A]
type MTransT[S[_], A] = EitherT[S, String, A]
type MTrans[A] = MTransT[IStateT, A]
def eval(k: Int): MTrans[Int] = {
for {
state <- get[IState].liftM[MTransT]
_ <- put(state.copy(s = state.s % k)).liftM[MTransT]
} yield (k + 1)
}
def process(i: Int, k: Int): MTrans[Int] = {
for {
state <- get[IState].liftM[MTransT]
_ <- put(state.copy(s = state.s + i)).liftM[MTransT]
res <- eval(k)
} yield res
}
def run() = {
val m = Stream
.range(1, 1000000000)
.traverseKTrampoline[MTrans, Int, Int](i => kleisli(process(i, _))).run(7)
m.run(IState(0))
}
}
Update 2
Based on some input from Eric and from Applicative vs. monadic combinators and the free monad in Scalaz I came up with the following simple foldLeft
-based solution using the applicative *>
:
val m = Stream
.range(1, 1000000000)
.toEphemeralStream
.foldLeft(0.point[MTrans]) { acc => i =>
acc *> process(i, 3)
}
While this (still) seems to be stack safe, it requires large amounts of heap space and runs really slow.