Parse large CSV file in a short time in PHP

3.1k Views Asked by At

I have been looking for how to find a value in one line and return the value of another column in a CSV file.

This is my function and it works fine but in small files:

function find_user($filename, $id) {
    $f = fopen($filename, "r");
    $result = false;
    while ($row = fgetcsv($f, 0, ";")) {
        if ($row[6] == $id) {
            $result = $row[5];
            break;
        }
    }
    fclose($f);
    return $result;
}

The problem is that the actual file with which I must work has a size of 4GB. And the time it takes to search is tremendous.

Navigating through Stack Overflow, I found the following post: file_get_contents => PHP Fatal error: Allowed memory exhausted

There they give me the following function that (from what I understood) makes it easier for me to search for huge CSV values:

function file_get_contents_chunked($file,$chunk_size,$callback)
{
    try
    {
        $handle = fopen($file, "r");
        $i = 0;
        while (!feof($handle))
        {
            call_user_func_array($callback,array(fread($handle,$chunk_size),&$handle,$i));
            $i++;
        }

        fclose($handle);

    }
    catch(Exception $e)
    {
         trigger_error("file_get_contents_chunked::" . $e->getMessage(),E_USER_NOTICE);
         return false;
    }

    return true;
}

And the way of using it seems to be the following:

$success = file_get_contents_chunked("my/large/file",4096,function($chunk,&$handle,$iteration){
    /*
        * Do what you will with the {&chunk} here
        * {$handle} is passed in case you want to seek
        ** to different parts of the file
        * {$iteration} is the section fo the file that has been read so
        * ($i * 4096) is your current offset within the file.
    */

});

if(!$success)
{
    //It Failed
}

The problem is that I do not know how to adapt my initial code to work with the raised function to speed up the search in large CSVs. My knowledge in PHP is not very advanced.

1

There are 1 best solutions below

0
On

No matter how you read the file, there's no way to make search faster since you always have to scan every character while searching for the correct row and column. Worst case is when the row you're looking for is the last one in a file.

You should import your CSV to a proper indexed database and modify your application to further save new records to that database instead of a CSV file.

Here's a rudimentary example using SQLite. I created a CSV file with 100 million records (~5GB) and tested with it.

Create a SQLite database and import your CSV file into it:

$f = fopen('db.csv', 'r');
$db = new SQLite3('data.db');
$db->exec('CREATE TABLE "user" ("id" INT PRIMARY KEY, "name" TEXT,
    "c1" TEXT, "c2" TEXT, "c3" TEXT, "c4" TEXT, "c5" TEXT)');
$stmt = $db->prepare('INSERT INTO "user"
    ("id", "name", "c1", "c2", "c3", "c4", "c5") VALUES (?, ?, ?, ?, ?, ?, ?)');
$stmt->bindParam(1, $id, SQLITE3_INTEGER);
$stmt->bindParam(2, $name, SQLITE3_TEXT);
$stmt->bindParam(3, $c1, SQLITE3_TEXT);
$stmt->bindParam(4, $c2, SQLITE3_TEXT);
$stmt->bindParam(5, $c3, SQLITE3_TEXT);
$stmt->bindParam(6, $c4, SQLITE3_TEXT);
$stmt->bindParam(7, $c5, SQLITE3_TEXT);
$db->exec('BEGIN TRANSACTION');
while ($row = fgetcsv($f, 0, ';')) {
    list($c1, $c2, $c3, $c4, $c5, $name, $id) = $row;
    $stmt->execute();
}
$db->exec('COMMIT');

This takes a long time, over 15 minutes on my computer, resulting a 6.5GB file.

Search from a database:

$id = 99999999;
$db = new SQLite3('data.db');
$stmt = $db->prepare('SELECT "name" FROM "user" WHERE "id" = ?');
$stmt->bindValue(1, $id, SQLITE3_INTEGER);
$result = $stmt->execute();
print_r($result->fetchArray());

This executes virtually instantenously.