Finding a percentage of profitable investments with pandas groupby functionality

I have a pandas DataFrame like this; it shows the history of investing in stocks. In the "Profit" column, 1 means "profitable" and "0" means "unprofitable".

Stock  Year   Profit  Count
 AAPL  2012    0       23
 AAPL  2012    1       19
 AAPL  2013    0       20
 AAPL  2013    1       10
GOOG   2012    0       26
GOOG   2012    1       20
GOOG   2013    0       23
GOOG   2013    1       11

      

I need to find out the percentage of profitable investments:

Stock  Year   Profit  CountPercent
 AAPL  2012    1       38.77
 AAPL  2013    1       33.33
GOOG   2012    1       43.47
GOOG   2013    1       32.35

      

I tried using the method in this post  but it shows 'TypeError: Join on level between two MultiIndex objects is ambiguous'

.

+3


source to share


3 answers


I have loaded your data into a dataframe called "stocks".

# Get the count of profitable trades, indexed by stock+year:
count_profitable = stocks[ stocks['Profit']==1 ].set_index(['Stock','Year']).Count
# Get the count of all trades, indexed by stock + year:
count_all        = stocks.groupby(['Stock','Year']).Count.sum()
# Render nice percentages
pandas.options.display.float_format = '{:.2f}%'.format 
(count_profitable/count_all) * 100

      



This will give:

Stock  Year
AAPL   2012   45.24%
       2013   33.33%
GOOG   2012   43.48%
       2013   32.35%
Name: Count, dtype: float64

      

+2


source


You can use pivot_table :



In [38]: result = df.pivot_table(index=['Stock', 'Year'], columns='Profit', values='Count', aggfunc='sum')

In [39]: result['CountPercent'] = result[1]/(result[0]+result[1])

In [41]: result['CountPercent']
Out[41]: 
Stock  Year
AAPL   2012    0.452381
       2013    0.333333
GOOG   2012    0.434783
       2013    0.323529
Name: CountPercent, dtype: float64

      

+2


source


Assuming your DataFrame is in a consistent format (namely that 0 is preceded by 1 in the Profit column), you can do the following groupby

:

>>> grouped = df.groupby(['Stock', 'Year'])
>>> perc = grouped['Count'].last() / grouped['Count'].sum()
>>> perc.reset_index()
  Stock  Year     Count
0  AAPL  2012  0.452381
1  AAPL  2013  0.333333
2  GOOG  2012  0.434783
3  GOOG  2013  0.323529

      

This is a normal normal DataFrame, so you just need to rename the "Count" column, round it to two decimal places and add the "Profit" column again.

+1


source







All Articles