Python: search value in header of another dataframe and replace / match corresponding value

I have a dataframe with index members that looks like this (A, B, C, ... are company names):

df_members

        Date  1  2  3  4
0 2016-01-01  A  B  C  D
1 2016-01-02  B  C  D  E
2 2016-01-03  C  D  E  F
3 2016-01-04  F  A  B  C
4 2016-01-05  B  C  D  E
5 2016-01-06  A  B  C  D

      

and I have a second table including prices for example:

df_prices

         Date   A   B   C   D   E   F
0  2015-12-30   1   2   3   4   5   6
1  2015-12-31   7   8   9  10  11  12
2  2016-01-01  13  14  15  16  17  18
3  2016-01-02  20  21  22  23  24  25
4  2016-01-03  27  28  29  30  31  32
5  2016-01-04  34  35  36  37  38  39
6  2016-01-05  41  42  43  44  45  46
7  2016-01-06  48  49  50  51  52  53

      

The goal is to replace all the company names in df1 with the price with df_prices, resulting in df_result:

df_result

         Date   1   2   3   4 
0  2016-01-01  13  14  15  16
1  2016-01-02  21  22  23  24
2  2016-01-03  29  30  31  32
3  2016-01-04  39  34  35  36
4  2016-01-05  42  43  44  45
5  2016-01-06  48  49  50  51

      

I already have a solution where I iterate over all cells in df_members, look for values ​​in df_prices and write them to a new df_result dataframe. The problem is that my data frames are very large and this process takes about 7 hours.

I've already tried using merge / join, map or lookup function but couldn't solve the problem.

My approach is as follows:

# Create new dataframes
df_result = pd.DataFrame(columns=df_members.columns, index=unique_dates_list)

# Load prices
df_prices = prices

# Search ticker & write values in new dataframe
for i in range(0,len(df_members)):
    for j in range(0,len(df_members.columns)):
        if str(df_members.iloc[i, j]) != 'nan' and df_members.iloc[i, j] in df_prices.columns:
            df_result.iloc[i, j] = df_prices.iloc[i, df_prices.columns.get_loc(df_members.iloc[i, j])]

      

Question: Is there a way to display values ​​more efficiently?

+3


source to share


1 answer


pandas.lookup()

will do what you need:

Code:

df_result = pd.DataFrame(columns=[], index=df_members.index)
for column in df_members.columns:
    df_result[column] = df_prices.lookup(
        df_members.index, df_members[column])

      

Test code:



import pandas as pd

df_members = pd.read_fwf(StringIO(
    u"""
          Date  1  2  3  4
    2016-01-01  A  B  C  D
    2016-01-02  B  C  D  E
    2016-01-03  C  D  E  F
    2016-01-04  F  A  B  C
    2016-01-05  B  C  D  E
    2016-01-06  A  B  C  D"""
), header=1).set_index('Date')

df_prices = pd.read_fwf(StringIO(
    u"""
          Date   A   B   C   D   E   F
    2015-12-30   1   2   3   4   5   6
    2015-12-31   7   8   9  10  11  12
    2016-01-01  13  14  15  16  17  18
    2016-01-02  20  21  22  23  24  25
    2016-01-03  27  28  29  30  31  32
    2016-01-04  34  35  36  37  38  39
    2016-01-05  41  42  43  44  45  46
    2016-01-06  48  49  50  51  52  53"""
), header=1).set_index('Date')

df_result = pd.DataFrame(columns=[], index=df_members.index)
for column in df_members.columns:
    df_result[column] = df_prices.lookup(
        df_members.index, df_members[column])

print(df_result)

      

Results:

             1   2   3   4
Date                      
2016-01-01  13  14  15  16
2016-01-02  21  22  23  24
2016-01-03  29  30  31  32
2016-01-04  39  34  35  36
2016-01-05  42  43  44  45
2016-01-06  48  49  50  51

      

+1


source







All Articles