Analyzing data across multiple data frames? Panels or multi-index?

When I fetch data for multiple stocks using web.DataReader, I get the panel as output.

import numpy as np
import pandas as pd
import pandas_datareader.data as web
import matplotlib.pyplot as plt
import datetime as dt
import re



startDate = '2010-01-01'
endDate = '2016-09-07'  
stocks_query = ['AAPL','OPK']


stocks = web.DataReader(stocks_query, data_source='yahoo',
                  start=startDate, end=endDate)
stocks = stocks.swapaxes('items','minor_axis')

      

Leading to the conclusion:

Dimensions: 2 (items) x 1682 (major_axis) x 6 (minor_axis)
Items axis: AAPL to OPK
Major_axis axis: 2010-01-04 00:00:00 to 2016-09-07 00:00:00
Minor_axis axis: Open to Adj Close

      

A single panel data block looks like this:

stocks['OPK']

            Open  High   Low  Close      Volume  Adj Close  log_return  \
Date                                                                     
2010-01-04  1.80  1.97  1.76   1.95    234500.0       1.95         NaN   
2010-01-05  1.64  1.95  1.64   1.93    135800.0       1.93   -0.010309   
2010-01-06  1.90  1.92  1.77   1.79    546600.0       1.79   -0.075304   
2010-01-07  1.79  1.94  1.76   1.92    138700.0       1.92    0.070110   
2010-01-08  1.92  1.94  1.86   1.89     62500.0       1.89   -0.015748  

      

I plan on doing a lot of data manipulation across all dataframes by adding new columns. comparing two data frames, etc. I was advised to look in multi_indexing as the panels are outdated.

This is my first time working with panels. If I want to add a new column for both data frames (AAPL, OPK) I had to do something like this:

for i in stocks:
        stocks[i]['log_return'] = np.log(stocks[i]['Close']/(stocks[i]['Close'].shift(1)))

      

If multi_indexing is indeed recommended for dealing with multiple dataframes, how exactly would I go about converting my data into a form that I can easily work with? Will I have one main index and the next level is stocks, and columns will be contained in each stock?

I went through the docs that gave many examples, using o tuples that I didn't get, or examples using single dataframes. http://pandas.pydata.org/pandas-docs/stable/advanced.html

So how exactly do I convert my panel to the multi_index framework?

+3


source to share


2 answers


I would like to extend @piRSquared answer with a few examples:

In [40]: stocks.to_frame()
Out[40]:
                              AAPL         OPK
Date       minor
2010-01-04 Open       2.134300e+02        1.80
           High       2.145000e+02        1.97
           Low        2.123800e+02        1.76
           Close      2.140100e+02        1.95
           Volume     1.234324e+08   234500.00
           Adj Close  2.772704e+01        1.95
2010-01-05 Open       2.146000e+02        1.64
           High       2.155900e+02        1.95
           Low        2.132500e+02        1.64
           Close      2.143800e+02        1.93
...                            ...         ...
2016-09-06 Low        1.075100e+02        9.19
           Close      1.077000e+02        9.36
           Volume     2.688040e+07  3026900.00
           Adj Close  1.066873e+02        9.36
2016-09-07 Open       1.078300e+02        9.39
           High       1.087600e+02        9.60
           Low        1.070700e+02        9.38
           Close      1.083600e+02        9.59
           Volume     4.236430e+07  2632400.00
           Adj Close  1.073411e+02        9.59

[10092 rows x 2 columns]

      

but if you want to convert it to MultiIndex DF it is better to leave the original pandas_datareader panel as such:



In [38]: p = web.DataReader(stocks_query, data_source='yahoo', start=startDate, end=endDate)

In [39]: p.to_frame()
Out[39]:
                        Open        High         Low       Close       Volume   Adj Close
Date       minor
2010-01-04 AAPL   213.429998  214.499996  212.380001  214.009998  123432400.0   27.727039
           OPK      1.800000    1.970000    1.760000    1.950000     234500.0    1.950000
2010-01-05 AAPL   214.599998  215.589994  213.249994  214.379993  150476200.0   27.774976
           OPK      1.640000    1.950000    1.640000    1.930000     135800.0    1.930000
2010-01-06 AAPL   214.379993  215.230000  210.750004  210.969995  138040000.0   27.333178
           OPK      1.900000    1.920000    1.770000    1.790000     546600.0    1.790000
2010-01-07 AAPL   211.750000  212.000006  209.050005  210.580000  119282800.0   27.282650
           OPK      1.790000    1.940000    1.760000    1.920000     138700.0    1.920000
2010-01-08 AAPL   210.299994  212.000006  209.060005  211.980005  111902700.0   27.464034
           OPK      1.920000    1.940000    1.860000    1.890000      62500.0    1.890000
...                      ...         ...         ...         ...          ...         ...
2016-08-31 AAPL   105.660004  106.570000  105.639999  106.099998   29662400.0  105.102360
           OPK      9.260000    9.260000    9.070000    9.100000    2793300.0    9.100000
2016-09-01 AAPL   106.139999  106.800003  105.620003  106.730003   26701500.0  105.726441
           OPK      9.310000    9.540000    9.190000    9.290000    3515300.0    9.290000
2016-09-02 AAPL   107.699997  108.000000  106.820000  107.730003   26802500.0  106.717038
           OPK      9.340000    9.390000    9.160000    9.330000    2061200.0    9.330000
2016-09-06 AAPL   107.900002  108.300003  107.510002  107.699997   26880400.0  106.687314
           OPK      9.320000    9.480000    9.190000    9.360000    3026900.0    9.360000
2016-09-07 AAPL   107.830002  108.760002  107.070000  108.360001   42364300.0  107.341112
           OPK      9.390000    9.600000    9.380000    9.590000    2632400.0    9.590000

[3364 rows x 6 columns]

      

How to work with MultiIndex DF:

In [46]: df = p.to_frame()

In [47]: df.loc[pd.IndexSlice[:, ['AAPL']], :]
Out[47]:
                        Open        High         Low       Close       Volume   Adj Close
Date       minor
2010-01-04 AAPL   213.429998  214.499996  212.380001  214.009998  123432400.0   27.727039
2010-01-05 AAPL   214.599998  215.589994  213.249994  214.379993  150476200.0   27.774976
2010-01-06 AAPL   214.379993  215.230000  210.750004  210.969995  138040000.0   27.333178
2010-01-07 AAPL   211.750000  212.000006  209.050005  210.580000  119282800.0   27.282650
2010-01-08 AAPL   210.299994  212.000006  209.060005  211.980005  111902700.0   27.464034
2010-01-11 AAPL   212.799997  213.000002  208.450005  210.110003  115557400.0   27.221758
2010-01-12 AAPL   209.189995  209.769995  206.419998  207.720001  148614900.0   26.912110
2010-01-13 AAPL   207.870005  210.929995  204.099998  210.650002  151473000.0   27.291720
2010-01-14 AAPL   210.110003  210.459997  209.020004  209.430000  108223500.0   27.133657
2010-01-15 AAPL   210.929995  211.599997  205.869999  205.930000  148516900.0   26.680198
...                      ...         ...         ...         ...          ...         ...
2016-08-24 AAPL   108.570000  108.750000  107.680000  108.029999   23675100.0  107.014213
2016-08-25 AAPL   107.389999  107.879997  106.680000  107.570000   25086200.0  106.558539
2016-08-26 AAPL   107.410004  107.949997  106.309998  106.940002   27766300.0  105.934466
2016-08-29 AAPL   106.620003  107.440002  106.290001  106.820000   24970300.0  105.815591
2016-08-30 AAPL   105.800003  106.500000  105.500000  106.000000   24863900.0  105.003302
2016-08-31 AAPL   105.660004  106.570000  105.639999  106.099998   29662400.0  105.102360
2016-09-01 AAPL   106.139999  106.800003  105.620003  106.730003   26701500.0  105.726441
2016-09-02 AAPL   107.699997  108.000000  106.820000  107.730003   26802500.0  106.717038
2016-09-06 AAPL   107.900002  108.300003  107.510002  107.699997   26880400.0  106.687314
2016-09-07 AAPL   107.830002  108.760002  107.070000  108.360001   42364300.0  107.341112

[1682 rows x 6 columns]

      

+2


source


You will like this



stocks.to_frame()

      

+2


source







All Articles