Combine two data frames on a common column

Question

Combine two data frames on a common column

I want to join two data sources, orders and customers:

orders is a SQL Server table:

orderid| customerid | orderdate | ordercost
------ | -----------| --------- | --------
12000  | 1500       |2008-08-09 |  38610

and clients are csv file:

customerid,first_name,last_name,starting_date,ending_date,country
1500,Sian,Read,2008-01-07,2010-01-07,Greenland

I want to join these two tables in my Python application, so I wrote the following code:

# Connect to SQL Sever with Pyodbc library

connection = pypyodbc.connect("connection string here")
cursor=connection.cursor();
cursor.execute("SELECT * from order)
result= cursor.fetchall()

# convert the result to pandas Dataframe
df1 = pd.DataFrame(result, columns= ['orderid','customerid','orderdate','ordercost'])

# Read CSV File
df2=pd.read_csv(customer_csv)

# Merge two dataframes
merged= pd.merge( df1, df2, on= 'customerid', how='inner')
print(merged[['first_name', 'country']])

I am waiting

first_name | country
-----------|--------
Sian       | Greenland

But I am getting empty result.

When I execute this code for two dataframes which are both from CSV files and it works fine. Any help?

Thank.

+1

sql join pandas dataframe

User193452 22 Mar '17 at 9:22

source to share

2 answers

an empty dataframe result for pd.merge means you don't have matching values in two frames. Have you checked the data type? use

df1['customerid'].dtype

for check.

as well as post-import conversion (as suggested in another answer), you can also tell pandas what dtype you want when you read the csv

df2=pd.read_csv(customer_csv, dtype={'customerid': str))

+1

Stael 22 Mar '17 at 9:26

source to share

jezrael · Accepted Answer · 2017-03-22T09:24:56+0000

I think the problem in columns customerid

is different dtypes

in both DataFrames

, so it doesn't match.

Therefore, you need to convert both columns to int

or both to str

.

df1['customerid'] = df1['customerid'].astype(int)
df2['customerid'] = df2['customerid'].astype(int)

Or:

df1['customerid'] = df1['customerid'].astype(str)
df2['customerid'] = df2['customerid'].astype(str)

It is also possible to omit how='inner'

, because the default is merge

:

merged= pd.merge( df1, df2, on= 'customerid')

Combine two data frames on a common column

More articles: