Plotting the difference between two datetime64 [ns]

Hi I have a dataframe that contains 2 columns of type datetime64[ns]

. I am clearing the data to remove null (NaT) values ​​(removing rows where null occurs in both columns) and subtracting one column from the other to get the difference.

What is the best way to plot a histogram of this data with minute bins and 10 minute bins?

I've tried numpy.histrogram

(which returned an error:) TypeError: ufunc add cannot use operands with types dtype('<m8[ns]') and dtype('float64')

and hist(series)

(which returned an error:) KeyError: 0

.

When I do series.dtype

return <m8[ns]

.

+3


source to share


2 answers


Let's generate some data:

import numpy as np
d1 = np.arange(np.datetime64('2014-11-01 12:00'), np.datetime64('2014-11-01 14:00'))
d2 = d1.copy()
np.random.shuffle(d2)
diff = d2 - d1

      

The type of difference is now - as you know - timedelta. Here minutes:

>>> diff.dtype
dtype('<m8[m]')

      

But we need floats or integers, so we send our data:

>>> plt.hist(diff.astype(np.int32))
(array([  3.,   9.,  11.,  17.,  17.,  27.,  10.,  12.,  11.,   3.]), array([-115. ,  -92.2,  -69.4,  -46.6,  -23.8,   -1. ,   21.8,   44.6,
         67.4,   90.2,  113. ]), <a list of 10 Patch objects>)
>>> plt.ylabel('time difference [m]')

      

enter image description here



The trick is in the details: we converted the timedeltas to integers (could also be floats, but we don't need them here).

>>> diff.astype(np.int32)
array([  78,   47,   55,   25,   22,   58,  113,    0,   -3,    7,   95,
        104,   10,   69,   16,   34,   87,   -2,   83,   16,   77,   48,
         10,   30,   52,   31,   47,   54,   83,  -21,   16,   76,   85,
         58,   68,   12,   74,    1,   68,   21,  -15,  -27,   -6,    1,
         -3,   43,  -34,   32,   46,  -22,    5,  -48,   16,  -33,   55,
        -37,  -25,  -53,  -21,  -48,   54,  -51,  -33,   20,  -12,   48,
         14,  -34,    6,   -2,  -36,    6,   20,  -67,  -55,   43,   32,
        -12,   11,   16,    5,  -31,   34,   21,  -20,   11,  -77,  -26,
        -18,    1,  -18,  -68,    6,   19,  -92,   -9,   -9,  -26,  -40,
        -98,  -34,   -1,  -43,  -82,  -65,  -88,  -52,  -32,  -84,  -58,
        -97,  -49,  -13,  -73,  -71, -115,  -71,  -24,  -76,  -35], dtype=int32)

      

Or, if you need them in seconds, convert them first to seconds and then to integers:

>>> diff.astype('m8[s]').astype(np.int32)
array([ 4680,  2820,  3300,  1500,  1320,  3480,  6780,     0,  -180,
         420,  5700,  6240,   600,  4140,   960,  2040,  5220,  -120,
        4980,   960,  4620,  2880,   600,  1800,  3120,  1860,  2820,
        3240,  4980, -1260,   960,  4560,  5100,  3480,  4080,   720,
        4440,    60,  4080,  1260,  -900, -1620,  -360,    60,  -180,
        2580, -2040,  1920,  2760, -1320,   300, -2880,   960, -1980,
        3300, -2220, -1500, -3180, -1260, -2880,  3240, -3060, -1980,
        1200,  -720,  2880,   840, -2040,   360,  -120, -2160,   360,
        1200, -4020, -3300,  2580,  1920,  -720,   660,   960,   300,
       -1860,  2040,  1260, -1200,   660, -4620, -1560, -1080,    60,
       -1080, -4080,   360,  1140, -5520,  -540,  -540, -1560, -2400,
       -5880, -2040,   -60, -2580, -4920, -3900, -5280, -3120, -1920,
       -5040, -3480, -5820, -2940,  -780, -4380, -4260, -6900, -4260,
       -1440, -4560, -2100], dtype=int32)

      

Depending on the precision of the data, you may need to use float instead of integer:

diff.astype('m8[s]').astype(np.float32)

      

+3


source


It might be overdue, but it might help someone. I wanted to achieve the same and made a throw, but had an error. My timedelta implemented was in [ns]. So I had to cast on [s] first, then it could be cast on float.



time_delta.astype('m8[s]').astype(np.float32)

      

0


source







All Articles