Finding a percentage of profitable investments with pandas groupby functionality
I have a pandas DataFrame like this; it shows the history of investing in stocks. In the "Profit" column, 1 means "profitable" and "0" means "unprofitable".
Stock Year Profit Count
AAPL 2012 0 23
AAPL 2012 1 19
AAPL 2013 0 20
AAPL 2013 1 10
GOOG 2012 0 26
GOOG 2012 1 20
GOOG 2013 0 23
GOOG 2013 1 11
I need to find out the percentage of profitable investments:
Stock Year Profit CountPercent
AAPL 2012 1 38.77
AAPL 2013 1 33.33
GOOG 2012 1 43.47
GOOG 2013 1 32.35
I tried using the method in this post
but it shows 'TypeError: Join on level between two MultiIndex objects is ambiguous'
.
source to share
I have loaded your data into a dataframe called "stocks".
# Get the count of profitable trades, indexed by stock+year:
count_profitable = stocks[ stocks['Profit']==1 ].set_index(['Stock','Year']).Count
# Get the count of all trades, indexed by stock + year:
count_all = stocks.groupby(['Stock','Year']).Count.sum()
# Render nice percentages
pandas.options.display.float_format = '{:.2f}%'.format
(count_profitable/count_all) * 100
This will give:
Stock Year
AAPL 2012 45.24%
2013 33.33%
GOOG 2012 43.48%
2013 32.35%
Name: Count, dtype: float64
source to share
You can use pivot_table :
In [38]: result = df.pivot_table(index=['Stock', 'Year'], columns='Profit', values='Count', aggfunc='sum')
In [39]: result['CountPercent'] = result[1]/(result[0]+result[1])
In [41]: result['CountPercent']
Out[41]:
Stock Year
AAPL 2012 0.452381
2013 0.333333
GOOG 2012 0.434783
2013 0.323529
Name: CountPercent, dtype: float64
source to share
Assuming your DataFrame is in a consistent format (namely that 0 is preceded by 1 in the Profit column), you can do the following groupby
:
>>> grouped = df.groupby(['Stock', 'Year'])
>>> perc = grouped['Count'].last() / grouped['Count'].sum()
>>> perc.reset_index()
Stock Year Count
0 AAPL 2012 0.452381
1 AAPL 2013 0.333333
2 GOOG 2012 0.434783
3 GOOG 2013 0.323529
This is a normal normal DataFrame, so you just need to rename the "Count" column, round it to two decimal places and add the "Profit" column again.
source to share