Doctrine inserts a lot of data
I am working on importing 300000 lines inside many csvs.
First I take a csv and import each row into a table inside my database.
After I want to parse the whole row and insert into the right table with some relation to this data.
So I tried this:
$qb = $this->entityManager->createQueryBuilder();
$flows = $qb->select('flow')
->from('AppBundle:FlowAndata', 'flow')
->getQuery()
->getResult();
$countRows = 0;
foreach ($flows as $row) {
//some check
$entity = new TestTable();
$entity->setCode($row->getCode());
//many other fields
$this->entityManager->persist($entity);
$this->entityManager->flush();
}
In this case, the whole procedure took about 5 seconds for each line!
Now if I add setMaxResults like this:
$qb = $this->entityManager->createQueryBuilder();
$flows = $qb->select('flow')
->from('AppBundle:FlowAndata', 'flow')
->setMaxResults(100)
->getQuery()
->getResult();
It took less than 1 second!
So, I decided to get all the lines and break it down into a recursive function using setMaxResult like this:
$qb = $this->entityManager->createQueryBuilder();
$flows = $qb->select('flow')
->from('AppBundle:FlowAndata', 'flow')
->getQuery()
->getResult();
$countFlows = count($flows);
$numberOfQuery = $countFlows / 100;
for ($i = 0; $i <= $numberOfQuery; $i++) {
$this->entityManager->clear();
$qb = $this->entityManager->createQueryBuilder();
$flows = $qb->select('flow')
->from('AppBundle:FlowAndata', 'flow')
->setFirstResult($i * 100)
->setMaxResults(100)
->getQuery()
->getResult();
}
Thus, I am creating many queries split across 100 lines. Is it good practice or is there a better way to parse many lines and insert it?
source to share
The efficient way recommended by the official Doctrine documentation is using transactional write behavior EntityManager
.
Iterating Big Results for Data Processing
You can
iterate()
only use the method to iterate over a large result and no UPDATE or DELETE intent. The instanceIterableResult
returned from$query->iterate()
implements the Iterator interface, so you can handle a large result without memory issues using the following approach. ( See example )
Bulk inserts
Bulk inserts in Doctrine are best done in batches using transactional write behavior
EntityManager
. [...] You may have to experiment with the batch size to find the size that works best for you. Larger batch sizes mean reusing the re-prepared statement internally, but also means more work in timeflush
. ( See example )
A version mixing both technologies (Inside Entity Repository):
$q = $this->_em->createQuery('SELECT f FROM AppBundle:FlowAndata f');
$iterableResult = $q->iterate();
$i = 0;
$batchSize = 100;
foreach ($iterableResult as $row) {
// do stuff with the data in the row, $row[0] is always the object
/** @var AppBundle\Entity\FlowAndata $flow */
$flow = $row[0];
//some check
$entity = new TestTable();
$entity->setCode($row->getCode());
//many other fields
$this->_em->persist($entity);
$i++;
if (($i % $batchSize) === 0) {
$this->_em->flush();
// Detaches all objects from Doctrine!
$this->_em->clear();
} else {
// detach from Doctrine, so that it can be Garbage-Collected immediately
$this->_em->detach($flow);
}
}
$this->_em->flush(); //Persist objects that did not make up an entire batch
$this->_em->clear();
source to share