Doctrine inserts a lot of data

I am working on importing 300000 lines inside many csvs.

First I take a csv and import each row into a table inside my database.

After I want to parse the whole row and insert into the right table with some relation to this data.

So I tried this:

    $qb = $this->entityManager->createQueryBuilder();
    $flows = $qb->select('flow')
        ->from('AppBundle:FlowAndata', 'flow')
        ->getQuery()
        ->getResult();

    $countRows = 0;
    foreach ($flows as $row) {
         //some check 
         $entity = new TestTable();
         $entity->setCode($row->getCode());
         //many other fields
         $this->entityManager->persist($entity);
         $this->entityManager->flush();
    }

      

In this case, the whole procedure took about 5 seconds for each line!

Now if I add setMaxResults like this:

    $qb = $this->entityManager->createQueryBuilder();
    $flows = $qb->select('flow')
        ->from('AppBundle:FlowAndata', 'flow')
        ->setMaxResults(100)
        ->getQuery()
        ->getResult();

      

It took less than 1 second!

So, I decided to get all the lines and break it down into a recursive function using setMaxResult like this:

    $qb = $this->entityManager->createQueryBuilder();
    $flows = $qb->select('flow')
        ->from('AppBundle:FlowAndata', 'flow')
        ->getQuery()
        ->getResult();

    $countFlows = count($flows);
    $numberOfQuery = $countFlows / 100;

    for ($i = 0; $i <= $numberOfQuery; $i++) {
         $this->entityManager->clear();
         $qb = $this->entityManager->createQueryBuilder();
         $flows = $qb->select('flow')
            ->from('AppBundle:FlowAndata', 'flow')
            ->setFirstResult($i * 100)
            ->setMaxResults(100)
            ->getQuery()
            ->getResult();

    }

      

Thus, I am creating many queries split across 100 lines. Is it good practice or is there a better way to parse many lines and insert it?

+3


source to share


1 answer


The efficient way recommended by the official Doctrine documentation is using transactional write behavior EntityManager

.

Iterating Big Results for Data Processing

You can iterate()

only use the method to iterate over a large result and no UPDATE or DELETE intent. The instance IterableResult

returned from $query->iterate()

implements the Iterator interface, so you can handle a large result without memory issues using the following approach. ( See example )

Bulk inserts



Bulk inserts in Doctrine are best done in batches using transactional write behavior EntityManager

. [...] You may have to experiment with the batch size to find the size that works best for you. Larger batch sizes mean reusing the re-prepared statement internally, but also means more work in time flush

. ( See example )

A version mixing both technologies (Inside Entity Repository):

$q = $this->_em->createQuery('SELECT f FROM AppBundle:FlowAndata f');
$iterableResult = $q->iterate();

$i = 0;
$batchSize = 100;

foreach ($iterableResult as $row) {
    // do stuff with the data in the row, $row[0] is always the object 
    /** @var AppBundle\Entity\FlowAndata $flow */
    $flow = $row[0];

    //some check 
    $entity = new TestTable();
    $entity->setCode($row->getCode());
    //many other fields

    $this->_em->persist($entity);

    $i++;
    if (($i % $batchSize) === 0) {
        $this->_em->flush();
        // Detaches all objects from Doctrine!
        $this->_em->clear(); 
    } else {
        // detach from Doctrine, so that it can be Garbage-Collected immediately
        $this->_em->detach($flow);
    }
}

$this->_em->flush(); //Persist objects that did not make up an entire batch
$this->_em->clear();

      

+4


source







All Articles