Converting dataframe to numpy matrix where indices stored in dataframe

Question

Converting dataframe to numpy matrix where indices stored in dataframe

I have a dataframe that looks like this

    time                    usd    hour  day
0    2015-08-30 07:56:28    1.17    7    0
1    2015-08-30 08:56:28    1.27    8    0
2    2015-08-30 09:56:28    1.28    9    0
3    2015-08-30 10:56:28    1.29    10    0
4    2015-08-30 11:56:28    1.29    11    0
14591   2017-04-30 23:53:46 9.28    23  609

With that in mind, how would I start plotting a numd 2d matrix with hour being one axis of the day being the other axis and then usd being the value stored in the matrix

+3

python numpy pandas

Tommie jones 02 May '17 at 17:30

source to share

3 answers

I would make a pivot_table and leave the data as a pandas DataFrame, but converting to a numpy array is trivial if you don't need the labels.

import pandas as pd
data = <data>
data.pivot_table(values = 'usd', index = 'hour', columns = 'day').values

Edit: Thanks to @pyRSquared for the "Value" tip. (changed np.array (data) to df ...)

+3

Back2Basics May 02 '17 at 17:41

source to share

You can use the pivot

functionality pandas

as described here . You will get values NaN

for usd

if for day

or hour

not for.

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'usd': [1.17, 1.27, 1.28, 1.29, 1.29, 9.28], 'hour': [7, 8, 9, 10, 11, 23], 'day': [0, 0, 0, 0, 0, 609]})

In [3]: df
Out[3]: 
   day  hour   usd
0    0     7  1.17
1    0     8  1.27
2    0     9  1.28
3    0    10  1.29
4    0    11  1.29
5  609    23  9.28

In [4]: df.pivot(index='hour', columns='day', values='usd')
Out[4]: 
day    0     609
hour            
7     1.17   NaN
8     1.27   NaN
9     1.28   NaN
10    1.29   NaN
11    1.29   NaN
23     NaN  9.28

+2

Michael Gecht May 02 '17 at 17:35

source to share

piRSquared · Accepted Answer · 2017-05-02T17:39:18+0000

Consider a data block df

df = pd.DataFrame(dict(
        time=pd.date_range('2015-08-30', periods=14000, freq='H'),
        usd=(np.random.randn(14000) / 100 + 1.0005).cumprod()
    ))

Then we can set the index with columns date

and hour

column df.time

and unstack

. We take values

this result to access the numpy array.

a = df.set_index([df.time.dt.date, df.time.dt.hour]).usd.unstack().values

Converting dataframe to numpy matrix where indices stored in dataframe

More articles: