Premature EOF when loading large CSV file using univocity parser.

667 Views Asked by At
Caused by: java.io.IOException: Premature EOF
    at sun.net.www.http.ChunkedInputStream.readAheadBlocking(ChunkedInputStream.java:565)
    at sun.net.www.http.ChunkedInputStream.readAhead(ChunkedInputStream.java:609)
    at sun.net.www.http.ChunkedInputStream.read(ChunkedInputStream.java:696)
    at java.io.FilterInputStream.read(FilterInputStream.java:133)
    at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3393)
    at org.glassfish.jersey.client.internal.HttpUrlConnector$2.read(HttpUrlConnector.java:228)
    at org.glassfish.jersey.message.internal.EntityInputStream.read(EntityInputStream.java:102)
    at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
    at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
    at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
    at java.io.InputStreamReader.read(InputStreamReader.java:184)
    at java.io.BufferedReader.read1(BufferedReader.java:210)
    at java.io.BufferedReader.read(BufferedReader.java:286)
    at com.univocity.parsers.common.input.concurrent.CharBucket.fill(CharBucket.java:70)
    at com.univocity.parsers.common.input.concurrent.ConcurrentCharLoader.readBucket(ConcurrentCharLoader.java:71)
    at com.univocity.parsers.common.input.concurrent.ConcurrentCharLoader.run(ConcurrentCharLoader.java:88)
    at java.lang.Thread.run(Thread.java:748)

Parser configuration is as below :

com.univocity.parsers.common.TextParsingException: java.io.IOException - Premature EOF
Parser Configuration: CsvParserSettings:
        Auto configuration enabled=true
        Autodetect column delimiter=true
        Autodetect quotes=true
        Column reordering enabled=true
        Delimiters for detection=[]
        Empty value=null
        Escape unquoted values=false
        Header extraction enabled=null
        Headers=null
        Ignore leading whitespaces=true
        Ignore leading whitespaces in quotes=false
        Ignore trailing whitespaces=true
        Ignore trailing whitespaces in quotes=false
        Input buffer size=8388608
        Input reading on separate thread=true
        Keep escape sequences=false
        Keep quotes=false
        Length of content displayed on error=-1
        Line separator detection enabled=true
        Maximum number of characters per column=4096
        Maximum number of columns=512
        Normalize escaped line separators=true
        Null value=null
        Number of records to read=all
        Processor=none
        Restricting data in exceptions=false
        RowProcessor error handler=null
        Selected fields=none
        Skip bits as whitespace=true
        Skip empty lines=true
        Unescaped quote handling=nullFormat configuration:
        CsvFormat:
                Comment character=#
                Field delimiter=,
                Line separator (normalized)=\n
                Line separator sequence=\n
                Quote character="
                Quote escape character="
                Quote escape escape character=null

Internal State when error was thrown as below :

Internal state when error was thrown: line=1171815, column=4, record=1171815, charIndex=134217728, headers=[Counter, FirstName, LastName, IdNumber, StartDate, Salary, SecurityCleared, ManagerFName, ManagerLName, ManagerId, ProfileId, DateEvaluated, FriendFname, FriendLname, Friend], content parsed=201 at com.univocity.parsers.common.AbstractParser.handleException(AbstractParser.java:369) at com.univocity.parsers.common.AbstractParser.parseNext(AbstractParser.java:595)

1

There are 1 best solutions below

1
On

Author of the library here.

The server seems to sending invalid chunking data, or prematurely terminating the connection. This doesn't seem to be the parser's fault.

Are you able to save that file locally using something like apache-commons-io FileUtils.copyURLToFile?

If you can, also avoid giving a BufferedReader to the parser as it has its own internal buffer.