How to use fgetcsv() when the CSV has several double quotes """" or if the entire line is wrapped in quotes?

459 Views Asked by At

Some CSV files that we import to our server cannot be parsed correctly.

We are reading the CSV file with PHP's fgetcsv():

while (($line = fgetcsv($file)) !== false) { ... }

However, when the CSV line is wrapped in quotes (and contains two double quotes inside), for example:

"first entry,"""","""",Data Chunk,2022-05-30"

The fgetcsv() function cannot handle the line correctly and sees the first entry,"""","""",Data Chunk,2022-05-30 as one entry.

How can we make sure the function does regard first entry as a separate entry, and also interpretes the other parts """" as empty entries?


On more research I found:

Fields containing double quotes ("), Line Break (CRLF) and Comma must be enclosed with double quotes.

If Fields enclosed by double quotes (") contain double quotes character then the double quotes inside the field must be preceded with another double quote as an escape sequence. Source

This is likely the issue that we face here.


A more complete data example of the CSV:

Allgemeines
Subject,Body,Attachment,Author,Created At,Updated At
"Hello everyone, this is a sample. Kind regards,"""","""",Author name (X),2022-05-30 14:54:32 UTC,2022-05-30 14:54:37 UTC"
","""",https://padlet-uploads.storage.googleapis.com/456456456/testfile.docx,Author name (X),2022-05-15 13:53:04 UTC,2022-05-15 13:54:40 UTC"
",""Hello everyone!"

This is some fun text.
More to come.
Another sentence.
And more text.

Even more text

See you soon.




","",Author name (X),2021-07-22 09:41:06 UTC,2021-07-23 16:12:42 UTC
""
Important Things to Know in 2022
Subject,Body,Attachment,Author,Created At,Updated At
"","

01.01.2022 First day of new year
02.02.2202 Second day of new year

Please plan ahead.
","",Author name (X),2021-07-22 09:58:19 UTC,2022-03-24 14:16:50 UTC
""

Note: Line starts with double quote and ends with double quote and carriage return and new line feed.

1

There are 1 best solutions below

0
Avatar On

Turns out the CSV data was corrupted.

The user messed around with the CSV in Excel, and as stated in the comments, likely overwrote the original CSV. Causing double escapings.

For anyone facing the same issue:

  1. Do not waste your time in trying to recover corrupted CSV files with a custom parser.

  2. Ask your user to give you access to the original CSV export site and generate the CSV yourself.

  3. Check the CSV integrity. See code below.

    $file = fopen($csvfile, 'r');

    // validate if all the records have same number of fields, empty lines (count 1), full entry (count 6) - depends on your CSV structure

    $length_array = array();

    while (($data = fgetcsv($file, 1000, ",")) !== false) 
    {
        // count number of entries
        $length_array[] = count($data);
    };

    $length_array = array_unique($length_array);

    // free memory by closing file
    fclose($file);
    
    // depending on your CSV structure it is $length_array==1 or $length_array==2
    if (count($length_array) > 2) 
    {
        // count mismatch
        return 'Invalid CSV!';
    }