I'm using php-resque to parse and validate data in large files then import that data into a mysql database.
I already know that LOAD DATA INFILE can be used to read rows from a text file into a table, but doesn't perform any validation whatsoever.
My database structure:
ItemsFile Table:
id filename fileepath valid_items invalid_items processed_items processed
Item Table:
id uid item file_id created_at
My Resque Job Class looks like this:
php-resque forks a child process and instantiates ItemsFileProcessor class then
- setUp() gets called
- perform() gets called
/**
* Read and validate items form a file, and store them in a database.
*/
class ItemsFileProcessor {
//ItemsFile Model instance
private $items_file = null;
//Item Model instance
private $item = null;
//retrieved from ItemsFile table.
private $file = null;
public function __construct() {
$this->items_file = new ItemsFile();
$this->item = new Item();
}
public function setUp() {
if (isset($this->args['file_id'])) {
//get file from ItemsFile Table by id.
$this->file = $this->items_file->getFile($this->args['file_id']);
if (empty($this->file)) {
//End job processing if file does not exist.
exit(-1);
}
}
}
function perform() {
//NodeJs, socket.io, redis, broadcasting system
EventBroadcaster::broadcast('app-jobs-channel', 'file_processing_started');
$processed_items = 0;
$valid_items = 0;
$invalid_items = 0;
//item validation class instance
$item_validator = new ItemValidator();
try {
$tmp_file = new SplFileObject($this->file->filepath);
//Read items from file, and validate each item.
while ($tmp_file->valid()) {
$line = trim($tmp_file->fgets());
if ($line !== '') {
if ($item_validator->isValid($line, new ItemValidationRule())) {
//store item in Item table.
$this->item->create([
'uid' => 'foo',
'item' => $line,
'file_id' => $this->file->id,
]);
$valid_items++;
} else {
$invalid_items++;
}
$processed_items++;
}
}
//update ItemsFile Table record
$this->items_file->update(
$this->file->id,
[
'processed_items' => $processed_items,
'valid_items' => $valid_items,
'invalid_items' => $invalid_items,
'processed' => 'Processed',
]
);
EventBroadcaster::broadcast('app-jobs-channel', 'file_processing_completed');
} catch (LogicException $exception) {
//broadcast failure.
EventBroadcaster::broadcast('app-jobs-channel', 'file_processing_failed');
Logger::getInstance()->log('ProcessContactFile Exception: '.$exception->getMessage(), Logger::LOGTYPE_ERROR);
exit(-1);
}
}
}
My Problems:
- Processing a file takes too long
- Mysql has to process all the insert request one by one. LOAD DATA INFILE is much faster.
My Question:
Is there a way to optimize this or maybe introduce LOAD DATA INFILE somehow.
You can have many problems of performance while managing files with PHP. Then, I suggest you to do it with SHELL to parse the file and return one string (that represents your general request with all your inserts). From now you just have to execute this request.
Can help if isn't clear.