PHP PDO fetch () loop dies after processing part of a large dataset
I have a PHP script that processes a "large" dataset (about 100k records) from a PDO request into one set of objects in a typical loop:
while ($record = $query->fetch()) {
$obj = new Thing($record);
/* do some processing */
$list[] = $obj;
$count++;
}
error_log('Processed '.$count.' records');
This loop processes about 50% of the dataset and then inexplicably breaks.
Things I've tried:
- Memory profiling:
memory_get_peak_usage()
Sequentially outputs about 63 MB before the loop ends. The memory limit is 512MB, set via php.ini. - Usage
set_time_limit()
to increase script execution time to 1 hour (3600 seconds). The loop breaks long before this and I don't see the usual error in the log for this. - Setting
PDO::MYSQL_ATTR_USE_BUFFERED_QUERY
tofalse
to avoid buffering the entire dataset - Logout
$query->errorInfo()
immediately after breaking the loop. It didn't help as the error code was "00000". - Checking the MySQL error log. Nothing remarkable there before, after, or while this script doesn't work.
- Refinement of processing up to 20 thousand records. No difference. The loop broke in the same place. However, by "clearing" the PDO statement object at the end of each batch, I was able to get up to 54% processed amount.
Other strange behavior:
- When I set the memory limit with
ini_set('memory_limit', '1024MB')
, the loop actually dies earlier than with the lower memory limit, around 20%. - In this loop, the PHP process is using 100% CPU, but once it is interrupted, the utilization drops to 2% despite being immediately processed in another loop immediately afterwards. Probably the connection to the MySQL server in the first loop is very resource intensive.
I am doing this all locally using MAMP PRO if that matters.
Is there something else that might consistently break this loop that I haven't tested? Is this just not a viable strategy for handling many records?
UPDATE
After using the batching strategy (20k increments), I started to see MySQL bug consistently around the third batch MySQL server has gone away
:; possibly a symptom of a long-term unbuffered request.
source to share
If you really need to process 100K records on the fly, you need to do the processing in SQL and get the output as needed - this should save a lot of time.
But you probably can't do it for some reason. You always process all lines from the statement, so use fetchAll once - and let MySQL have one after that, like this:
$records = $query->fetchAll()
foreach ($records as record)
{
$obj = new Thing($record);
/* do some processing */
$list[] = $obj;
$count++;
}
error_log('Processed '.$count.' records');
Also select only the lines that you will be using. If that doesn't help, you can try: Setting the PDO connection timeout .
source to share