Something as simple as adding 3 nested reduce is giving me an out of memory error.
Execution error (OutOfMemoryError) at .../my-large-lazy$iter$fn$fn$iter$fn$fn$iter$fn$fn (serial_write.clj:39).
Java heap space: failed reallocation of scalar replaced objects
Issue happens when I evaluate (add-cords 1000) I am running with
java -Xmx1G ...
(defn my-large-lazy
[size]
(for [x (range size)]
(for [y (range size)]
(for [z (range size)]
{:x x :y y :z z}))))
(defn add-coords
[size]
(reduce (fn [sum-1 x-1]
(reduce (fn [sum-2 x-2]
(reduce (fn [sum-3 x-3]
(+ sum-3 (:x x-3) (:y x-3) (:z x-3)))
sum-2 x-2))
sum-1 x-1))
0 (my-large-lazy size)))
I expect it to compute the sum taking may be a few KB of memory, not error at 1GB. Ideally the memory requirement for this should be O(1). Larger the size larger the memory it takes. Avoiding dictionary in {:x x :y y :z z} helps, but in the original program it is unavoidable. I am looking at how to implement add-cords efficiently.
I know about holding head of sequence and other usual gotchas. But this seem to be evading me. I tried a profiler which measures allocations, but I can't figure out why they are not being deallocated. I tried a few variants of implementing this function, but all gets into same OOM issue. I also tried various JVMs, with minor differences in overall memory they all fail.
This is a very interesting question. It is true that the head is not retained of the sequence during reduce, but then what causes the OOM error?
Even though the head of the original lazy sequence is not retained during reducing over it, the current LazySeq object is kept in memory, because it holds the reference to the rest of the sequence. And since the current LazySeq instance is kept, its head element is also kept.
And since you have a nested structure of lazy sequences, everything referenced in the outmost current sequence element is also kept, even if those have already been processed, until the outmost current sequence is moved to the next element.
For your example, when we are processing
x=9,y=9,z=9, then the outer lazy seq holdingx=9is still in memory, which means thatx=9,y=0,z=0,x=9,y=0,z=1, ...x=9,y=9,z=9are also there. And these are kept until we move tox=10in the outermost sequence.The solution is to reorganize the code a bit: intead of a nested structure of lazy sequences, flatten it out to a single lazy sequence: