Combine two data frames on a common column

I want to join two data sources, orders and customers:

orders is a SQL Server table:

orderid| customerid | orderdate | ordercost
------ | -----------| --------- | --------
12000  | 1500       |2008-08-09 |  38610


and clients are csv file:



I want to join these two tables in my Python application, so I wrote the following code:

# Connect to SQL Sever with Pyodbc library

connection = pypyodbc.connect("connection string here")
cursor.execute("SELECT * from order)
result= cursor.fetchall()

# convert the result to pandas Dataframe
df1 = pd.DataFrame(result, columns= ['orderid','customerid','orderdate','ordercost'])

# Read CSV File

# Merge two dataframes
merged= pd.merge( df1, df2, on= 'customerid', how='inner')
print(merged[['first_name', 'country']])


I am waiting

first_name | country
Sian       | Greenland


But I am getting empty result.

When I execute this code for two dataframes which are both from CSV files and it works fine. Any help?



source to share

2 answers

I think the problem in columns customerid

is different dtypes

in both DataFrames

, so it doesn't match.

Therefore, you need to convert both columns to int

or both to str


df1['customerid'] = df1['customerid'].astype(int)
df2['customerid'] = df2['customerid'].astype(int)



df1['customerid'] = df1['customerid'].astype(str)
df2['customerid'] = df2['customerid'].astype(str)


It is also possible to omit how='inner'

, because the default is merge


merged= pd.merge( df1, df2, on= 'customerid')




an empty dataframe result for pd.merge means you don't have matching values ​​in two frames. Have you checked the data type? use



for check.

as well as post-import conversion (as suggested in another answer), you can also tell pandas what dtype you want when you read the csv

df2=pd.read_csv(customer_csv, dtype={'customerid': str))




All Articles