Python: search value in header of another dataframe and replace / match corresponding value
I have a dataframe with index members that looks like this (A, B, C, ... are company names):
df_members
Date 1 2 3 4
0 2016-01-01 A B C D
1 2016-01-02 B C D E
2 2016-01-03 C D E F
3 2016-01-04 F A B C
4 2016-01-05 B C D E
5 2016-01-06 A B C D
and I have a second table including prices for example:
df_prices
Date A B C D E F
0 2015-12-30 1 2 3 4 5 6
1 2015-12-31 7 8 9 10 11 12
2 2016-01-01 13 14 15 16 17 18
3 2016-01-02 20 21 22 23 24 25
4 2016-01-03 27 28 29 30 31 32
5 2016-01-04 34 35 36 37 38 39
6 2016-01-05 41 42 43 44 45 46
7 2016-01-06 48 49 50 51 52 53
The goal is to replace all the company names in df1 with the price with df_prices, resulting in df_result:
df_result
Date 1 2 3 4
0 2016-01-01 13 14 15 16
1 2016-01-02 21 22 23 24
2 2016-01-03 29 30 31 32
3 2016-01-04 39 34 35 36
4 2016-01-05 42 43 44 45
5 2016-01-06 48 49 50 51
I already have a solution where I iterate over all cells in df_members, look for values ββin df_prices and write them to a new df_result dataframe. The problem is that my data frames are very large and this process takes about 7 hours.
I've already tried using merge / join, map or lookup function but couldn't solve the problem.
My approach is as follows:
# Create new dataframes
df_result = pd.DataFrame(columns=df_members.columns, index=unique_dates_list)
# Load prices
df_prices = prices
# Search ticker & write values in new dataframe
for i in range(0,len(df_members)):
for j in range(0,len(df_members.columns)):
if str(df_members.iloc[i, j]) != 'nan' and df_members.iloc[i, j] in df_prices.columns:
df_result.iloc[i, j] = df_prices.iloc[i, df_prices.columns.get_loc(df_members.iloc[i, j])]
Question: Is there a way to display values ββmore efficiently?
source to share
pandas.lookup()
will do what you need:
Code:
df_result = pd.DataFrame(columns=[], index=df_members.index)
for column in df_members.columns:
df_result[column] = df_prices.lookup(
df_members.index, df_members[column])
Test code:
import pandas as pd
df_members = pd.read_fwf(StringIO(
u"""
Date 1 2 3 4
2016-01-01 A B C D
2016-01-02 B C D E
2016-01-03 C D E F
2016-01-04 F A B C
2016-01-05 B C D E
2016-01-06 A B C D"""
), header=1).set_index('Date')
df_prices = pd.read_fwf(StringIO(
u"""
Date A B C D E F
2015-12-30 1 2 3 4 5 6
2015-12-31 7 8 9 10 11 12
2016-01-01 13 14 15 16 17 18
2016-01-02 20 21 22 23 24 25
2016-01-03 27 28 29 30 31 32
2016-01-04 34 35 36 37 38 39
2016-01-05 41 42 43 44 45 46
2016-01-06 48 49 50 51 52 53"""
), header=1).set_index('Date')
df_result = pd.DataFrame(columns=[], index=df_members.index)
for column in df_members.columns:
df_result[column] = df_prices.lookup(
df_members.index, df_members[column])
print(df_result)
Results:
1 2 3 4
Date
2016-01-01 13 14 15 16
2016-01-02 21 22 23 24
2016-01-03 29 30 31 32
2016-01-04 39 34 35 36
2016-01-05 42 43 44 45
2016-01-06 48 49 50 51
source to share