PHP PDO fetch () loop dies after processing part of a large dataset

I have a PHP script that processes a "large" dataset (about 100k records) from a PDO request into one set of objects in a typical loop:

while ($record = $query->fetch()) {
    $obj = new Thing($record);

    /* do some processing */

    $list[] = $obj;
    $count++;
}

error_log('Processed '.$count.' records');

      

This loop processes about 50% of the dataset and then inexplicably breaks.

Things I've tried:

  • Memory profiling: memory_get_peak_usage()

    Sequentially outputs about 63 MB before the loop ends. The memory limit is 512MB, set via php.ini.
  • Usage set_time_limit()

    to increase script execution time to 1 hour (3600 seconds). The loop breaks long before this and I don't see the usual error in the log for this.
  • Setting PDO::MYSQL_ATTR_USE_BUFFERED_QUERY

    to false

    to avoid buffering the entire dataset
  • Logout $query->errorInfo()

    immediately after breaking the loop. It didn't help as the error code was "00000".
  • Checking the MySQL error log. Nothing remarkable there before, after, or while this script doesn't work.
  • Refinement of processing up to 20 thousand records. No difference. The loop broke in the same place. However, by "clearing" the PDO statement object at the end of each batch, I was able to get up to 54% processed amount.

Other strange behavior:

  • When I set the memory limit with ini_set('memory_limit', '1024MB')

    , the loop actually dies earlier than with the lower memory limit, around 20%.
  • In this loop, the PHP process is using 100% CPU, but once it is interrupted, the utilization drops to 2% despite being immediately processed in another loop immediately afterwards. Probably the connection to the MySQL server in the first loop is very resource intensive.

I am doing this all locally using MAMP PRO if that matters.

Is there something else that might consistently break this loop that I haven't tested? Is this just not a viable strategy for handling many records?

UPDATE

After using the batching strategy (20k increments), I started to see MySQL bug consistently around the third batch MySQL server has gone away

:; possibly a symptom of a long-term unbuffered request.

+3


source to share


1 answer


If you really need to process 100K records on the fly, you need to do the processing in SQL and get the output as needed - this should save a lot of time.

But you probably can't do it for some reason. You always process all lines from the statement, so use fetchAll once - and let MySQL have one after that, like this:



$records = $query->fetchAll()
foreach ($records as record) 
{
    $obj = new Thing($record);
    /* do some processing */
    $list[] = $obj;
    $count++;
}
error_log('Processed '.$count.' records');

      

Also select only the lines that you will be using. If that doesn't help, you can try: Setting the PDO connection timeout .

0


source







All Articles