Analyzing data across multiple data frames? Panels or multi-index?
When I fetch data for multiple stocks using web.DataReader, I get the panel as output.
import numpy as np
import pandas as pd
import pandas_datareader.data as web
import matplotlib.pyplot as plt
import datetime as dt
import re
startDate = '2010-01-01'
endDate = '2016-09-07'
stocks_query = ['AAPL','OPK']
stocks = web.DataReader(stocks_query, data_source='yahoo',
start=startDate, end=endDate)
stocks = stocks.swapaxes('items','minor_axis')
Leading to the conclusion:
Dimensions: 2 (items) x 1682 (major_axis) x 6 (minor_axis)
Items axis: AAPL to OPK
Major_axis axis: 2010-01-04 00:00:00 to 2016-09-07 00:00:00
Minor_axis axis: Open to Adj Close
A single panel data block looks like this:
stocks['OPK']
Open High Low Close Volume Adj Close log_return \
Date
2010-01-04 1.80 1.97 1.76 1.95 234500.0 1.95 NaN
2010-01-05 1.64 1.95 1.64 1.93 135800.0 1.93 -0.010309
2010-01-06 1.90 1.92 1.77 1.79 546600.0 1.79 -0.075304
2010-01-07 1.79 1.94 1.76 1.92 138700.0 1.92 0.070110
2010-01-08 1.92 1.94 1.86 1.89 62500.0 1.89 -0.015748
I plan on doing a lot of data manipulation across all dataframes by adding new columns. comparing two data frames, etc. I was advised to look in multi_indexing as the panels are outdated.
This is my first time working with panels. If I want to add a new column for both data frames (AAPL, OPK) I had to do something like this:
for i in stocks:
stocks[i]['log_return'] = np.log(stocks[i]['Close']/(stocks[i]['Close'].shift(1)))
If multi_indexing is indeed recommended for dealing with multiple dataframes, how exactly would I go about converting my data into a form that I can easily work with? Will I have one main index and the next level is stocks, and columns will be contained in each stock?
I went through the docs that gave many examples, using o tuples that I didn't get, or examples using single dataframes. http://pandas.pydata.org/pandas-docs/stable/advanced.html
So how exactly do I convert my panel to the multi_index framework?
source to share
I would like to extend @piRSquared answer with a few examples:
In [40]: stocks.to_frame()
Out[40]:
AAPL OPK
Date minor
2010-01-04 Open 2.134300e+02 1.80
High 2.145000e+02 1.97
Low 2.123800e+02 1.76
Close 2.140100e+02 1.95
Volume 1.234324e+08 234500.00
Adj Close 2.772704e+01 1.95
2010-01-05 Open 2.146000e+02 1.64
High 2.155900e+02 1.95
Low 2.132500e+02 1.64
Close 2.143800e+02 1.93
... ... ...
2016-09-06 Low 1.075100e+02 9.19
Close 1.077000e+02 9.36
Volume 2.688040e+07 3026900.00
Adj Close 1.066873e+02 9.36
2016-09-07 Open 1.078300e+02 9.39
High 1.087600e+02 9.60
Low 1.070700e+02 9.38
Close 1.083600e+02 9.59
Volume 4.236430e+07 2632400.00
Adj Close 1.073411e+02 9.59
[10092 rows x 2 columns]
but if you want to convert it to MultiIndex DF it is better to leave the original pandas_datareader panel as such:
In [38]: p = web.DataReader(stocks_query, data_source='yahoo', start=startDate, end=endDate)
In [39]: p.to_frame()
Out[39]:
Open High Low Close Volume Adj Close
Date minor
2010-01-04 AAPL 213.429998 214.499996 212.380001 214.009998 123432400.0 27.727039
OPK 1.800000 1.970000 1.760000 1.950000 234500.0 1.950000
2010-01-05 AAPL 214.599998 215.589994 213.249994 214.379993 150476200.0 27.774976
OPK 1.640000 1.950000 1.640000 1.930000 135800.0 1.930000
2010-01-06 AAPL 214.379993 215.230000 210.750004 210.969995 138040000.0 27.333178
OPK 1.900000 1.920000 1.770000 1.790000 546600.0 1.790000
2010-01-07 AAPL 211.750000 212.000006 209.050005 210.580000 119282800.0 27.282650
OPK 1.790000 1.940000 1.760000 1.920000 138700.0 1.920000
2010-01-08 AAPL 210.299994 212.000006 209.060005 211.980005 111902700.0 27.464034
OPK 1.920000 1.940000 1.860000 1.890000 62500.0 1.890000
... ... ... ... ... ... ...
2016-08-31 AAPL 105.660004 106.570000 105.639999 106.099998 29662400.0 105.102360
OPK 9.260000 9.260000 9.070000 9.100000 2793300.0 9.100000
2016-09-01 AAPL 106.139999 106.800003 105.620003 106.730003 26701500.0 105.726441
OPK 9.310000 9.540000 9.190000 9.290000 3515300.0 9.290000
2016-09-02 AAPL 107.699997 108.000000 106.820000 107.730003 26802500.0 106.717038
OPK 9.340000 9.390000 9.160000 9.330000 2061200.0 9.330000
2016-09-06 AAPL 107.900002 108.300003 107.510002 107.699997 26880400.0 106.687314
OPK 9.320000 9.480000 9.190000 9.360000 3026900.0 9.360000
2016-09-07 AAPL 107.830002 108.760002 107.070000 108.360001 42364300.0 107.341112
OPK 9.390000 9.600000 9.380000 9.590000 2632400.0 9.590000
[3364 rows x 6 columns]
How to work with MultiIndex DF:
In [46]: df = p.to_frame()
In [47]: df.loc[pd.IndexSlice[:, ['AAPL']], :]
Out[47]:
Open High Low Close Volume Adj Close
Date minor
2010-01-04 AAPL 213.429998 214.499996 212.380001 214.009998 123432400.0 27.727039
2010-01-05 AAPL 214.599998 215.589994 213.249994 214.379993 150476200.0 27.774976
2010-01-06 AAPL 214.379993 215.230000 210.750004 210.969995 138040000.0 27.333178
2010-01-07 AAPL 211.750000 212.000006 209.050005 210.580000 119282800.0 27.282650
2010-01-08 AAPL 210.299994 212.000006 209.060005 211.980005 111902700.0 27.464034
2010-01-11 AAPL 212.799997 213.000002 208.450005 210.110003 115557400.0 27.221758
2010-01-12 AAPL 209.189995 209.769995 206.419998 207.720001 148614900.0 26.912110
2010-01-13 AAPL 207.870005 210.929995 204.099998 210.650002 151473000.0 27.291720
2010-01-14 AAPL 210.110003 210.459997 209.020004 209.430000 108223500.0 27.133657
2010-01-15 AAPL 210.929995 211.599997 205.869999 205.930000 148516900.0 26.680198
... ... ... ... ... ... ...
2016-08-24 AAPL 108.570000 108.750000 107.680000 108.029999 23675100.0 107.014213
2016-08-25 AAPL 107.389999 107.879997 106.680000 107.570000 25086200.0 106.558539
2016-08-26 AAPL 107.410004 107.949997 106.309998 106.940002 27766300.0 105.934466
2016-08-29 AAPL 106.620003 107.440002 106.290001 106.820000 24970300.0 105.815591
2016-08-30 AAPL 105.800003 106.500000 105.500000 106.000000 24863900.0 105.003302
2016-08-31 AAPL 105.660004 106.570000 105.639999 106.099998 29662400.0 105.102360
2016-09-01 AAPL 106.139999 106.800003 105.620003 106.730003 26701500.0 105.726441
2016-09-02 AAPL 107.699997 108.000000 106.820000 107.730003 26802500.0 106.717038
2016-09-06 AAPL 107.900002 108.300003 107.510002 107.699997 26880400.0 106.687314
2016-09-07 AAPL 107.830002 108.760002 107.070000 108.360001 42364300.0 107.341112
[1682 rows x 6 columns]
source to share