Fastest way to get table from MySQL in Pandas

I am trying to figure out the fastest way to fetch data from MySQL in Pandas. So far, I have tried three different approaches:

Approach 1: Using pymysql type and modifying the field (inspired by Fastest way to load numeric data in python / pandas / numpy array from MySQL )

import pymysql 
from pymysql.converters import conversions
from pymysql.constants import FIELD_TYPE
conversions[FIELD_TYPE.DECIMAL] = float
conversions[FIELD_TYPE.NEWDECIMAL] = float
conn = pymysql.connect(host = host, port = port, user= user, passwd= passwd, db= db)

      

Approach 2: Using MySqldb

import MySQLdb
from MySQLdb.converters import conversions
from MySQLdb.constants import FIELD_TYPE
conversions[FIELD_TYPE.DECIMAL] = float
conversions[FIELD_TYPE.NEWDECIMAL] = float
conn = MySQLdb.connect(host = host, port = port, user= user, passwd= passwd, db= db)

      

Approach 3: Using sqlalchemy

import sqlalchemy as SQL
engine = SQL.create_engine('mysql+mysqldb://{0}:{1}@{2}:{3}/{4}'.format(user, passwd, host, port, db))

      

Approach 2 is the best of the three and takes 4 seconds on average to get my table. However, retrieving the table only takes 2 seconds in MySQL Workbench. How can I get rid of those 2 extra seconds? Does anyone know of any alternative ways to achieve this?

+3


source to share


1 answer


I think you can find answers using a specific library like "peewee" or the df.read_sql_query function from the pandas library. To use df.read_sql_query:

MyEngine = create_engine('[YourDatabase]://[User]:[Pass]@[Host]/[DatabaseName]', echo = True)
df = pd.read_sql_query('select * from [TableName]', con= MyEngine)

      

Also, to load data from a dataframe into SQL:

df.to_sql([TableName], MyEngine, if_exists = 'append', index=False)

      

You must put if_exists = 'append' if the table already exists, or it will automatically detach. You can also put a replacement if you want to replace it as a new table.



To ensure data integrity, it is nice to use data for upload and download because of the ability to handle data well. Depending on the size of the download, it should also be quite efficient at download.

If you want to take the extra step, peewee requests can speed up your downloads, although I haven't personally tested the speed. Peewee is an ORM library like SQLAlchemy that I found very simple and expressive to develop. You can also use dataframes. Just go through the documentation - you have to build and assign the request and then convert it to a dataframe like this:

MyQuery = [TableName]select()where([TableName.column] == "value")
df = pd.DataFrame(list(MyQuery.dicts()))

      

Hope this helps.

0


source







All Articles