Pandas updating sql efficiently
I am using python pandas to load data from MySQL database, change and then update another table. There are 100,000 more rows, so the UPDATE query takes a while.
Is there a more efficient way to update the data in the database than using df.iterrows()
and running a query UPDATE
on every row?
source to share
The problem here isn't pandas, it's operations UPDATE
. Each line runs its own query UPDATE
, which means a lot of overhead for the database connector to process.
You are better off using the method df.to_csv('filename.csv')
to dump your dataframe to CSV, then read that CSV file in your MySQL database usingLOAD DATA INFILE
Load it into a new table and then the DROP
old and RENAME
new into the old name.
Also, I suggest you do the same when loading data into pandas. Use the SELECT INTO OUTFILE
MySQL command and then load that file into pandas with pd.read_csv()
.
source to share