Using com.opencsv.CSVReader on windows stops reading lines prematurely

1.1k Views Asked by At

I have two files that are identical except for the line ending codes. The one that uses the newline (linux/Unix)character works (reads all 550 rows of data) and the one that uses carriage return and line feed (Windows) stops returning lines after reading 269 lines. In both cases the data is read correctly up to the point where they stop. If I run dos2unix on the file that fails, the resulting file works.

I would like to be able read CSV files regardless of their origin. If I could at least detect that the file is in the wrong format before reading part of the data that would be helpful Even if I could tell at any time in the middle of reading the file that it was not going to work, I could output an error. My current state of reading half the file and terminating with no error is dangerous.

1

There are 1 best solutions below

2
On

The problem is that under the covers openCSV uses a BufferedReader which reads a line from the stream until it gets to the Systems line.seperator.

If you know beforehand what the line separator of the file is then in your application just do a System.setProperty("line.separator", newLine) where newLine is either "\n" or "\r\n" based on the file you are about to parse. Or you can pass that in as a parameter.

If you want to automatically detect the file character. Create a method that will take the file you want, create a BufferedReader and read a single line. If the last character is a '\r' then your system system uses "\n" but you want to set it to "\r\n". Else if line.contains("\n") returns true then you are on a system that uses "\r\n" and you want to set it to "\n". Otherwise the system and the file you are reading have compatible line feed characters.

Just note if you do change the system line feed character be sure to set it back after processing the file in case your program is processing multiple files.