Index matching in Python

Question

Index matching in Python

I have a large dataset that I am trying to manipulate for further analysis. The corresponding parts of the data frame will be shown below.

Loan   Closing Balance Date
1      175,000         2010-10-31
1      150,000         2010-11-30
1      125,000         2010-12-31
2      275,000         2010-10-31
2      250,000         2010-11-30
2      225,000         2010-12-31
3      375,000         2010-10-31
3      350,000         2010-11-30
3      320,000         2010-12-31

I would like to create a new column "Starting Balance", which is basically the closing balance for the end of the month of the month, so for the second row, the starting balance will be 175,000, which is the closing balance for the first row.

Since the dataset starts at 2010-10-31, I cannot find the balance for 2010-09-30, so for any row with the date 2010-10-31, I want to make an opening. The balance for this observation is equal to the closing balance.

This is how it should look:

Loan   Closing Balance Date         Opening Balance
1      175,000         2010-10-31   175,000
1      150,000         2010-11-30   175,000
1      125,000         2010-12-31   150,000
2      275,000         2010-10-31   275,000
2      250,000         2010-11-30   275,000
2      225,000         2010-12-31   250,000
3      375,000         2010-10-31   375,000
3      350,000         2010-11-30   375,000
3      320,000         2010-12-31   350,000

In Excel, I would normally do a complex index index using the eomonth function to do this, but not really sure how to do it in Python (still very new to it).

Any help was appreciated.

I have tried the approach suggested by Santosh and I get this:

Thanks, I tried your solution and got this:

    Closing Balance_x     Date_x  Closing Balance_y
0              175000 2010-09-30           150000.0
1              175000 2010-09-30           250000.0
2              175000 2010-09-30           350000.0
3              150000 2010-10-31           125000.0
4              150000 2010-10-31           225000.0
5              150000 2010-10-31           320000.0
6              125000 2010-11-30                NaN
7              275000 2010-09-30           150000.0
8              275000 2010-09-30           250000.0
9              275000 2010-09-30           350000.0
10             250000 2010-10-31           125000.0
11             250000 2010-10-31           225000.0
12             250000 2010-10-31           320000.0
13             225000 2010-11-30                NaN
14             375000 2010-09-30           150000.0
15             375000 2010-09-30           250000.0
16             375000 2010-09-30           350000.0
17             350000 2010-10-31           125000.0
18             350000 2010-10-31           225000.0
19             350000 2010-10-31           320000.0
20             320000 2010-11-30                NaN

Then I modified this code to perform a merge based on the loan ID and Date / pDate:

final_df = pd.merge(df, df, how="left", left_on=['Date'], right_on=['pDate'])

      Loan  Closing Balance_x     Date_x           Opening Balance
    0    1             175000 2010-09-30           150000.0
    1    1             150000 2010-10-31           125000.0
    2    1             125000 2010-11-30                NaN
    3    2             275000 2010-09-30           250000.0
    4    2             250000 2010-10-31           225000.0
    5    2             225000 2010-11-30                NaN
    6    3             375000 2010-09-30           350000.0
    7    3             350000 2010-10-31           320000.0
    8    3             320000 2010-11-30                NaN

Now in this case I'm not sure why I am getting NaN on every November observation. The opening balance for loan 1 in November should be 150,000. The opening balance in October should be 175,000. And the September opening balance should just default compared to the September opening balance, since I have no balance sheet for August.

Update

I think I solved the problem, I changed my merge code to:

final_df = pd.merge(df, df, how="left", left_on=['Loan','pDate'], right_on=['Loan','Date'])

This still gives me NaNs for September observations, but that's ok as I can manually replace those values.

+3

python merge indexing excel match

user2335564 Jul 26 17 at 17:06

source to share

1 answer

yesemsanthoshkumar · Answer 1 · 2017-07-26T17:20:14+0000

I suggest you add another column that says Date - (1month) and then join them in the date fields to get the initial balance.

df["cmonth"] = df.Date.apply(lambda x: x.year*100+x.month)
df["pDate"] = df.Date.apply(lambda x: (x - pd.DateOffset(months=1)))
df["pmonth"] = df.pDate.apply(lambda x: x.year*100+x.month)
final_df = pd.merge(df, df, how="left", left_on="cmonth", right_on="pmonth")
print(final_df[["close_x", "Date_x", "close_y"]])
#close_y is your opening balance

Index matching in Python

More articles: