Camel large CSV file processing Issue

681 Views Asked by At

I am trying to process a large CSV file of approximately 1 million records and after reading the rows (line/line or in chunks), I need to push this to camel-flatpack to create a map with field names and their values.

My requirement is to feed all the CSV records to a flatpack config and generate a java.util.map out of it.

There have been several posts on stackoverflow to resolve this by splitter but my process works fast till almost 35000 records but thereafter it slows down.

I tried even adding a throttler, it still doesnt work. I get a GC Out Of Memory Error. I even shot up my JAVA_MIN_MEM, JAVA_MAX_MEM, JAVA_PERM_MEM, JAVA_MAX_PERM_MEM but the result is the same. Hawtio console shows that JAVA_HEAP_MEMORY after about 5-6 mins is more than 95%.

Here is my code snippet:

    <route id="poller-route"> 
        <from uri="file://temp/output?noop=true&amp;maxMessagesPerPoll=10&amp;delay=5000"/>
        <split streaming="true" stopOnException="false">            
            <tokenize token="\n" />
            <to uri="flatpack:delim:flatpackConfig/flatPackConfig.pzmap.xml?ignoreFirstRecord=false"/>              
        </split>
    </route>

    <route id="output-route">
        <from uri="flatpack:delim:flatpackConfig/flatPackConfig.pzmap.xml?ignoreFirstRecord=false"/>
        <convertBodyTo type="java.util.Map"/>
        <to uri="mock:result"/>
    </route>
1

There are 1 best solutions below

0
On

One potential problem is that when you create hash maps and continuously add data to it, it needs to recreate the hash. For example, if i have hash of size 3, and input 0,1,2,3 into it, assuming my hash function is mod 3, three would be assigned to the zero slot thus creating overflow, so I would either need to store overflows or recreate a new hash.

I'm sure that this is how java implements its hashmap, but you could try initializing your hashmap's initial capacity to how many records there are.