What's the best way to compare / insert / update products in MySQL db from .CSV file

At our company, we pull the .CSV file from the suppliers' FTP server and update our product details (price, promotions, ..) every morning.

We wrote a cron for this task as it should start automatically.

The current script works in most cases. However, sometimes we get an error: "The allowed memory size of 134217728 bytes has been exhausted (tried to allocate 75 bytes)".

We are using CodeIgniter with ORM DataMapper. A possible design mistake could be the fact that the script is working on objects, not an array ...

49000 lines are checked each time.

Can anyone help us find another way to do this?


The next script function is the function that runs after the files have been copied.

// Include auth connection params
$udb = $this->_completeParams($db);
// Check if an update was downloaded
$supplier = new Supplier(NULL,$udb);
$supplier->where(array('alias'=>'XX','name'=>'xxxxxxxxx'))->get(1);

$cronStart = date('Y-m-d H:i:s');
$cronStartDate = date('Y-m-d');
//mail($this->adminMail, 'CRON', 'Gestart:' .$cronStart, $this->headerMail);

//$message .= '1: '.memory_get_usage()."\r\n";
if($supplier->import_found) {
//if(true) {
    $rows = 0;
    $updated = 0;
    $new = 0;

    //$aAvailable = array();

    $message .= '<h3>Start: '.$cronStart.'</h3>' . "\r\n";

    $object = new Supplier_product(NULL,$udb);
    $cat = new Supplier_category(NULL, $udb);
    $manu = new Supplier_manufacturer(NULL, $udb);

    $auvibel = new Supplier_auvibel(NULL, $udb);
    $bebat = new Supplier_bebat(NULL, $udb);
    $recupel = new Supplier_recupel(NULL, $udb);
    $reprobel = new Supplier_reprobel(NULL, $udb);

    $files = glob($this->tempDir.'XXXXX/prices/*');
    foreach($files as $file) {

        $ext = pathinfo($file, PATHINFO_EXTENSION);
        $data = ($ext == 'txt')?$this->_csvToArray($file, ';'):false;

        // If the CSV data is in $data
        if($data !== false) {
            $totalCount = count($data);
            for($i = 0; $i <= $totalCount; $i++) {

                //$aAvailable[] = $data[$i]['ArtID'];
                $rows++;
                //$message .= 'loop start: '.memory_get_usage()."\r\n";

                $object->where(array('art_id'=>$data[$i]['ArtID'],'supplier_id'=>$supplier->id))->get(1);

                $auvibel->select('value')->where(array('art_id'=>$data[$i]['ArtID'], 'supplier_id'=>$supplier->id))->get(1);
                $auvibel->value = ($auvibel->exists())?$auvibel->value:0;

                $bebat->select('value')->where(array('art_id'=>$data[$i]['ArtID'], 'supplier_id'=>$supplier->id))->get(1);
                $bebat->value = ($bebat->exists())?$bebat->value:0;

                $recupel->select('value')->where(array('art_id'=>$data[$i]['ArtID'], 'supplier_id'=>$supplier->id))->get(1);
                $recupel->value = ($recupel->exists())?$recupel->value:0;

                $reprobel->select('value')->where(array('art_id'=>$data[$i]['ArtID'], 'supplier_id'=>$supplier->id))->get(1);
                $reprobel->value = ($reprobel->exists())?$reprobel->value:0;

                $intrastat = 0;

                $data[$i]['LP_Eur'] = ($data[$i]['LP_Eur'] != '')?str_replace(',', '.', $data[$i]['LP_Eur']):0;
                $data[$i]['DE_Eur'] = ($data[$i]['DE_Eur'] != '')?str_replace(',', '.', $data[$i]['DE_Eur']):0;
                $data[$i]['D1_Eur'] = ($data[$i]['D1_Eur'] != '')?str_replace(',', '.', $data[$i]['D1_Eur']):0;
                $data[$i]['D1_Eur'] = ($data[$i]['D2_Eur'] != '')?str_replace(',', '.', $data[$i]['D2_Eur']):0;
                $data[$i]['PricePersonal_Eur'] = ($data[$i]['PricePersonal_Eur'] != '')?str_replace(',', '.', $data[$i]['PricePersonal_Eur']):0;
                $data[$i]['BackorderDate'] = ($data[$i]['BackorderDate'] != '')?date('Y-m-d', strtotime($data[$i]['BackorderDate'])):NULL;
                $data[$i]['ModifDate'] = ($data[$i]['ModifDate'] != '')?date('Y-m-d', strtotime($data[$i]['ModifDate'])):NULL;

                if($object->exists()) {
                    if($object->allow_cron_update) { //if($data[$i]['ModifDate'] != $object->modified) {

                        // Check if category group exists
                        $cat->select('id')->where(array(
                            'supplier_id' => $supplier->id,
                            'name_a' => $data[$i]['Class1'],
                            'name_b' => $data[$i]['Class2'],
                            'name_c' => $data[$i]['Class3'],
                        ))->get(1);
                        if(!$cat->exists()) {

                            // Category should be added
                            $cat->supplier_id = $supplier->id;
                            $cat->name_a = $data[$i]['Class1'];
                            $cat->name_b = $data[$i]['Class2'];
                            $cat->name_c = $data[$i]['Class3'];
                            $cat->save();

                            // Log as notification: New supplier categorie
                            $this->_notify('Niewe categorie',array(
                                'body' => $supplier->name.' heeft "'.$cat->name_a.' - '.$cat->name_b.' - '.$cat->name_c.'" als nieuwe categorie toegevoegd.',
                                'controller' => 'leveranciers',
                                'trigger' => 'new_supplier_category',
                                'url' => base_url().'leveranciers/item/'.$supplier->id.'/categorien',
                                'icon' => 'icon-truck',
                                'udb' => $udb,
                            ));
                        }

                        // Check if manufacturer exists
                        $manu->select('id')->where(array(
                            'name' => $data[$i]['PublisherName']
                        ))->get(1);
                        if(!$manu->exists()) {

                            // Manufacturer should be added
                            $manu->name = $data[$i]['PublisherName'];
                            $manu->save($supplier);
                        }

                        // Add the product to the database
                        $object->art_id = $data[$i]['ArtID'];
                        $object->supplier_id = $supplier->id;
                        $object->supplier_category_id = $cat->id;
                        $object->supplier_manufacturer_id = $manu->id;
                        $object->part_id = $data[$i]['PartID'];
                        $object->ean_code = $data[$i]['EanCode'];
                        $object->name = $data[$i]['Description'];
                        $object->description = NULL;
                        $object->version = $data[$i]['Version'];
                        $object->language = $data[$i]['Language'];
                        $object->media = $data[$i]['Media'];
                        $object->trend = $data[$i]['Trend'];
                        $object->price_group = $data[$i]['PriceGroup'];
                        $object->price_code = $data[$i]['PriceCode'];
                        $object->eur_lp = $data[$i]['LP_Eur'];
                        $object->eur_de = $data[$i]['DE_Eur'];
                        $object->eur_d1 = $data[$i]['D1_Eur'];
                        $object->eur_d2 = $data[$i]['D2_Eur'];
                        $object->eur_personal = $data[$i]['PricePersonal_Eur'];
                        $object->stock = $data[$i]['Stock'];
                        $object->backorder = ($data[$i]['BackorderDate'] != '' && !empty($data[$i]['BackorderDate']))?$data[$i]['BackorderDate']:NULL;
                        $object->modified = ($data[$i]['ModifDate'] != '' && !empty($data[$i]['ModifDate']))?$data[$i]['ModifDate']:NULL;
                        $object->flag = 'MODIFIED';
                        $object->auvibel = $auvibel->value;
                        $object->bebat = $bebat->value;
                        $object->intrastat = $intrastat;
                        $object->recupel = $recupel->value;
                        $object->reprobel = $reprobel->value;
                        $object->save();

                        $updated++;
                    }
                    elseif(($object->auvibel != $auvibel) || ($object->bebat != $bebat) || ($object->recupel != $recupel) || ($object->reprobel != $reprobel)) {
                        $object->auvibel = $auvibel->value;
                        $object->bebat = $bebat->value;
                        $object->intrastat = $intrastat;
                        $object->recupel = $recupel->value;
                        $object->reprobel = $reprobel->value;
                        $object->save();
                    }
                }
                else {

                    // Check if category group exists
                    $cat->select('id')->where(array(
                        'supplier_id' => $supplier->id,
                        'name_a' => $data[$i]['Class1'],
                        'name_b' => $data[$i]['Class2'],
                        'name_c' => $data[$i]['Class3'],
                    ))->get(1);
                    if(!$cat->exists()) {

                        // Category should be added
                        $cat->supplier_id = $supplier->id;
                        $cat->name_a = $data[$i]['Class1'];
                        $cat->name_b = $data[$i]['Class2'];
                        $cat->name_c = $data[$i]['Class3'];
                        $cat->save();

                        // Log as notification: New supplier categorie
                        $this->_notify('Niewe categorie',array(
                            'body' => $supplier->name.' heeft "'.$cat->name_a.' - '.$cat->name_b.' - '.$cat->name_c.'" als nieuwe categorie toegevoegd.',
                            'controller' => 'leveranciers',
                            'trigger' => 'new_supplier_category',
                            'url' => '[hidden-url]'.$supplier->id.'/categorien',
                            'icon' => 'icon-truck',
                            'udb' => $udb,
                        ));
                    }

                    // Check if manufacturer exists
                    $manu->select('id')->where(array(
                        'name' => $data[$i]['PublisherName']
                    ))->get(1);
                    if(!$manu->exists()) {

                        // Manufacturer should be added
                        $manu->name = $data[$i]['PublisherName'];
                        $manu->save($supplier);
                    }

                    // Add the product to the database
                    $object->art_id = $data[$i]['ArtID'];
                    $object->supplier_id = $supplier->id;
                    $object->supplier_category_id = $cat->id;
                    $object->supplier_manufacturer_id = $manu->id;
                    $object->part_id = $data[$i]['PartID'];
                    $object->ean_code = $data[$i]['EanCode'];
                    $object->name = $data[$i]['Description'];
                    $object->description = NULL;
                    $object->version = (($data[$i]['Version'] != '')?$data[$i]['Version']:NULL);
                    $object->language = (($data[$i]['Language'] != '')?$data[$i]['Language']:NULL);
                    $object->media = (($data[$i]['Media'] != '')?$data[$i]['Media']:NULL);
                    $object->trend = (($data[$i]['Trend'] != '')?$data[$i]['Trend']:NULL);
                    $object->price_group = (($data[$i]['PriceGroup'] != '')?$data[$i]['PriceGroup']:NULL);
                    $object->price_code = (($data[$i]['PriceCode'] != '')?$data[$i]['PriceCode']:NULL);
                    $object->eur_lp = (($data[$i]['LP_Eur'] != '')?$data[$i]['LP_Eur']:NULL);
                    $object->eur_de = (($data[$i]['DE_Eur'] != '')?$data[$i]['DE_Eur']:NULL);
                    $object->eur_d1 = (($data[$i]['D1_Eur'] != '')?$data[$i]['D1_Eur']:NULL);
                    $object->eur_d2 = (($data[$i]['D2_Eur'] != '')?$data[$i]['D2_Eur']:NULL);
                    $object->eur_personal = $data[$i]['PricePersonal_Eur'];
                    $object->stock = $data[$i]['Stock'];
                    $object->backorder = ($data[$i]['BackorderDate'] != '' && !empty($data[$i]['BackorderDate']))?$data[$i]['BackorderDate']:NULL;
                    $object->modified = ($data[$i]['ModifDate'] != '' && !empty($data[$i]['ModifDate']))?$data[$i]['ModifDate']:NULL;
                    $object->flag = NULL;
                    $object->auvibel = $auvibel->value;
                    $object->bebat = $bebat->value;
                    $object->intrastat = $intrastat;
                    $object->recupel = $recupel->value;
                    $object->reprobel = $reprobel->value;
                    $object->save();
                    //$object->clear_cache();

                    $new++;
                }

                //$message .= 'loop end A: '.memory_get_usage().' - '.$i."\r\n";

                $object->clear();
                $cat->clear();
                $manu->clear();
                $auvibel->clear();
                $bebat->clear();
                $recupel->clear();
                $reprobel->clear();

                unset($data[$i]);

                //$message .= 'loop end B: '.memory_get_usage()."\r\n";
            }
        }
        unset($manu);
        unset($auvibel);
        unset($bebat);
        unset($recupel);
        unset($reprobel);

        if(is_file($file)) {
            unlink($file);
        }

        $object->clear();
        //$message .= 'BEFORE MARK EOL: '.memory_get_usage()."\r\n";
        /**
         * Mark products as EOL when not found in file
         */
        $eolCount = 0;
        $eol = $object
            ->group_start()
                ->where('flag IS NULL')
                ->or_where('flag !=', 'EOL')
            ->group_end()
            ->where('supplier_id', $supplier->id)
            ->group_start()
                ->group_start()->where('updated IS NOT NULL')->where('updated <',$cronStart)->group_end()
                ->or_group_start()->where('updated IS NULL')->where('created <',$cronStart)->group_end()
            ->group_end()
            ->get_iterated();

        $p = new Product(NULL,$udb);
        //unset($aAvailable);
        foreach($eol as $i => $product) {
            $product->flag = "EOL";
            $product->save();

            if($product->art_id != NULL) {
                // The 'copied' products should be marked eol in the webshop!
                $p->where('art_code',$product->art_id)->where('supplier_product_id', $product->id)->get();
                if($p->exists()) {
                    $p->eol = date('Y-m-d H:i:s');
                    $p->save();
                }
                $p->clear();
            }

            $product->clear();
            $eolCount++;
            //unset($eol[$i]);
            //$message .= 'INSIDE MARK EOL: '.memory_get_usage()."\r\n";
        }
        unset($product);
        $object->clear();
        //$message .= 'AFTER MARK EOL: '.memory_get_usage()."\r\n";
        if($eolCount > 0) {
            // Log as notification: supplier products marked EOL
            $this->_notify('EOL melding',array(
                'body' => "Er ".(($eolCount == 1)?'is een product':'zijn '.$eolCount.' producten')." gemarkeerd als EOL",
                'controller' => 'leveranciers',
                'trigger' => 'eol_supplier_product',
                'url' => '[hidden-url]'.$supplier->id.'/artikels',
                'icon' => 'icon-truck',
                'udb' => $udb,
            ));
        }
    }

    // After looping files build e-mail.
    $message .= 'Totaal: '.$rows. "\r\n";
    $message .= 'new: '.$new. "\r\n";
    $message .= 'updated: '.$updated. "\r\n";
    $message .= 'EOL: '.$eolCount. "\r\n";
    $subject = 'Import XXXXX Update';
}
// No updates found
else {
    $subject = 'Import XXXXX No Update Found';
    $message .= "\r\n";
}
$message .= '<h3>Einde: '.date('Y-m-d H:i:s').'</h3>' . "\r\n";
mail($this->adminMail, $subject, $message, $this->headerMail);

// Remove import_found marker for supplier
$supplier->import_found = false;
$supplier->save();

      

+3


source to share


2 answers


We had a similar situation. After many attempts to make the script better, we decided we needed another approach to get our import job done, rather than taking ~ 10 hours.

What we did was dump all PHP code and instead use it mysqlimport

to load the contents of the CSV file directly into the table. This table now contains everything we need, but not in a form that is useful to us (no structure needed, some fields need some processing, etc.)



However, since everything is now in the database, we can do whatever we want with a query. For example, deleting all data that is no longer in the import file is simple DELETE FROM structured_table AS st LEFT JOIN unstructured_table AS ut ON st.someField = ut.someField WHERE ut.someField IS NULL;

, updating existing records is simple UPDATE structured_table AS st INNER JOIN unstructured_table AS ut ON st.someField = ut.someField SET st.anotherField = CONCAT(ut.aField, ' ', ut.yetAnotherField);

.

Obviously for complex script imports, your queries will be more complex and you will need more of them. You might even need to throw some stored procedures to be processed on separate fields. But if you take this approach, you end up with a process that can handle a lot of data and is very scalable.

+2


source


I have a similar situation ... Compare about 20 million records every day to update multiple records with changes and add / remove delta. The data source is CSV. I am using perl, although I think php works as well.

  • Each entry must have a link, product SKU? Or something like that. May already be the primary key / unique key in your DB table.
  • You know the lst of fields you want to compare and update.

Step 1: read ALL records from the DB, store them in an array using the bind key as the named index.

1.1: the concat value of all fields to be compared, or the md5 () of the concat result to save memory.

Step 2: loop through the CSV file, extract the bind key and new values ​​in the line.



2.1: if the reference to the reference is NOT in the array, the INSERT action for the DB.

2.2: isset () returns true, so compare values ​​(or md5 () from concat value) if different, UPDATE action for DB.

2.3: remove this entry from the array.

Step 3: By the end of the CSV reading, the records remain in the array, these are DELETE records.

In my case, it uses less than 2GB of RAM for the process and runs for about 3 minutes, which should be feasible and acceptable.

0


source







All Articles