Oracle / Mysql editing a large number of records, recommendations?

First I want to thank you for your help on the page, it was really helpful. Now I am faced with a situation that I cannot understand, and I hope you can help me, here it is:

At my workplace, I have access to an Oracle database with a table filled by users on it (first names, last name, date of birth, etc.). It has 17 million records, and for each record I need to generate a unique key (let's say the first two letters of the name plus the date of birth concatenation is actually more complicated, but this is an example), so for this key I need to get the record to calculate key and then update the record with the key. Now over 17 million records are killing the server!

So DB at work is Oracle, I copied the data to my machine into MYSQL database for testing. I planned to test locally, doing the calculations with php, and then create a Java application to generate the key on the server, but this is a huge workload! I don’t know where to go. Simple selection takes 10 minutes on php (in mysql via command line takes 1.49 minutes to COUNT () records)

I know it's better to compute the key as records come in. They come in 500K packets, but since I just came in, the data has already been merged and I have to work on these entries.

So what would you recommend this poor soul to do with this task. I was thinking about triggers or pl / sql, but I really lost what would be better in terms of performance.

Many thanks!!!

----------- as a request from ogres and hopefully you can help me, here's the code in php for the calculation I need to do for each line

$vn=0;//holds the value calculated for name
$sql="select * from roots";//table holding triplets for character|number (ex: 1|A|13), field names (consecutive|character|code)
$resultados=mysql_query($sql,$idcon);
while($dato=mysql_fetch_array($resultados))//i put all of the pairs in an associative array for quicker acces
{
    $consulta[$dato['v_propio']]=array($dato['caracter'],$dato['v_primo']);
}
//coding the name, for every char in the name i add to $vn, the consecutive times the position of the character in the string, plus the code for the character, if null, i add some values
$pos=1;
for ($i=0;$i<strlen($nombre);$i++)
{
    $char=substr($nombre,$i,1);
    $char=charnum($char);
    if($char!=NULL)
    {
    $vn=$vn+($char*$pos)+$consulta[$char][1];
    }
    else
    $vn=$vn + 28 + 107;
    $pos++;
}
//end of name coding
// i format the code for the name to 4 digits
if ( $vn < 1000 and $vn > 99 )
    $vn = '0'.$vn ;
else if ( $vn < 100 and $vn > 9 ) 
    $vn = '00'.$vn; 
else if ( $vn < 10 ) 
    $vn = '000'.$vn; 
 else
    $vn=$vn; 

      

// FINALLY I AM THE CONTACT OF THE FIRST TWO ENCHANCES IN THE NAME WITH THE CALCULATION CODE AND BIRTHDAY EX: JH235408071984 WILL COME FOR JHON BORN ON 1984/08/07 WITH THE CALCULATION CODE = 2354

$CODE=trim(substr($nombre,0,2)).trim($vn).formatFecha($fnac);

      

HOPES SHE HELPS AND YOU CAN GET ME SOME POINTED !!

+3


source to share


2 answers


, , , , -, , ( ), 1000 , , , , ( , ), , :

SELECT MIN(ID) AS MinID, MAX(ID) AS MaxID FROM Users

      

Assuming you have a PK id with id. This request should be pretty damn fast compared to a full COUNT (*) or COUNT (1). Then you can check the table mentioned above to see if the data exists, if not, start over, if so, start working on those ids from wherever you left off. It will probably take a long time to run with that many entries, but it can be done to make it possible to repeat as needed, or just run forever until it is done.

In the end it will look something like this (a lot of pseudo-functions, since I don't know which platform you will be using or not):



define("NUM_PER_ITERATION", 1000);

// Get our ID range
$query = "SELECT MIN(ID) AS MinID, MAX(ID) AS MaxID FROM Users";
$array = $MyDB->GetSingleRow($query);
$minid = (int) $array["MinID"];
$maxid = (int) $array["MaxID"];

// Get our last starting point
$startingpoint = LoadLastWorkPosition();
if (!$startingpoint || $startingpoint < $minid) {
  $startingpoint = $minid;
} else if ($startingpoint > $maxid) {
  echo("Already done!");
  exit;
}

// Run through the values
$curstart = $startingpoint;
while ($curstart <= $maxid) {
  $curend = $curstart + NUM_PER_ITERATION - 1;

  // Set a time out so it will keep running, you'll know way better
  // than I how long this should be for each loop
  set_time_limit(300);

  // Handle a number of results
  HandleResults($curstart, $curend);

  // Set the start of the next entry
  $curstart = $curend + 1;

  // Save our current progress
  SaveLastWorkPosition($curstart);
}

echo("All done!");

      

You will need to create LoadLastWorkPosition (which tries to load the last position and returns 0 or false or whatever if not present), SaveLastWorkPosition (so you can keep track of where you left off ...) a separate script to check the value to see where it is for the progress bar or tracker) and HandleResults (load ids in a specific range and create unique values ​​for them).

Anyway, hope this helps you get started if nothing else!

+1


source


you don't need to use php to calculate, firstly add a column for unique_key to your table and then create an index / unique constraint on that column, after which you can update the whole table once with sql, generate_value is unique_key for a row. computed in SQL not php

update table set unique_key = generated_value

      



after that, if you want to find a person, you will compute his unique_key and then use the query

select * from table where unique_key = generated_value

      

+1


source







All Articles