Pandas updating sql efficiently

I am using python pandas to load data from MySQL database, change and then update another table. There are 100,000 more rows, so the UPDATE query takes a while.

Is there a more efficient way to update the data in the database than using df.iterrows()

and running a query UPDATE

on every row?

+3


source to share


1 answer


The problem here isn't pandas, it's operations UPDATE

. Each line runs its own query UPDATE

, which means a lot of overhead for the database connector to process.

You are better off using the method df.to_csv('filename.csv')

to dump your dataframe to CSV, then read that CSV file in your MySQL database usingLOAD DATA INFILE



Load it into a new table and then the DROP

old and RENAME

new into the old name.

Also, I suggest you do the same when loading data into pandas. Use the SELECT INTO OUTFILE

MySQL command and then load that file into pandas with pd.read_csv()

.

+2


source







All Articles