I am interested in playing around with the Accelerate library, and I would like to perform some operations on data stored inside of a CSV file. I've read this excellent introduction to Accelerate, but I'm not sure how I can go about reading CSVs into Accelerate efficiently. I've thought about this, and the only thing I can think of is to parse the entire CSV file into one long list, and then feed the entire list into Accelerate.
My data sets will be quite large, and it doesn't seem efficient to read a 1 gb+ file into memory only to copy somewhere else. I noticed there was a CSV Enumerator package on Hackage, but I'm not sure how to use it with Accelerate's generate function. Another constraint is that it seems the dimensions of the Array, or at least number of elements, must be known before generating an array using Accelerate.
Has anyone dealt with this kind of problem before?
Thanks!
I am not sure if this is 100% applicable to
accelerate
orrepa
, but here is one way I've handled this for Vector in the past:It basically allocates
by
empty slots and proceeds to fill them. Once it hits the ceiling, it grows the underlying vector once again. I haven't benchmarked anything, but it appears to perform OK in practice. I am curious to see if there will be other more efficient answers here.Hope this helps in some way. I do see there's a
fromVector
function inrepa
and perhaps that's your golden ticket in combination with this method.