Plotting the difference between two datetime64 [ns]
Hi I have a dataframe that contains 2 columns of type datetime64[ns]
. I am clearing the data to remove null (NaT) values (removing rows where null occurs in both columns) and subtracting one column from the other to get the difference.
What is the best way to plot a histogram of this data with minute bins and 10 minute bins?
I've tried numpy.histrogram
(which returned an error:) TypeError: ufunc add cannot use operands with types dtype('<m8[ns]') and dtype('float64')
and hist(series)
(which returned an error:) KeyError: 0
.
When I do series.dtype
return <m8[ns]
.
source to share
Let's generate some data:
import numpy as np d1 = np.arange(np.datetime64('2014-11-01 12:00'), np.datetime64('2014-11-01 14:00')) d2 = d1.copy() np.random.shuffle(d2) diff = d2 - d1
The type of difference is now - as you know - timedelta. Here minutes:
>>> diff.dtype
dtype('<m8[m]')
But we need floats or integers, so we send our data:
>>> plt.hist(diff.astype(np.int32))
(array([ 3., 9., 11., 17., 17., 27., 10., 12., 11., 3.]), array([-115. , -92.2, -69.4, -46.6, -23.8, -1. , 21.8, 44.6,
67.4, 90.2, 113. ]), <a list of 10 Patch objects>)
>>> plt.ylabel('time difference [m]')
The trick is in the details: we converted the timedeltas to integers (could also be floats, but we don't need them here).
>>> diff.astype(np.int32)
array([ 78, 47, 55, 25, 22, 58, 113, 0, -3, 7, 95,
104, 10, 69, 16, 34, 87, -2, 83, 16, 77, 48,
10, 30, 52, 31, 47, 54, 83, -21, 16, 76, 85,
58, 68, 12, 74, 1, 68, 21, -15, -27, -6, 1,
-3, 43, -34, 32, 46, -22, 5, -48, 16, -33, 55,
-37, -25, -53, -21, -48, 54, -51, -33, 20, -12, 48,
14, -34, 6, -2, -36, 6, 20, -67, -55, 43, 32,
-12, 11, 16, 5, -31, 34, 21, -20, 11, -77, -26,
-18, 1, -18, -68, 6, 19, -92, -9, -9, -26, -40,
-98, -34, -1, -43, -82, -65, -88, -52, -32, -84, -58,
-97, -49, -13, -73, -71, -115, -71, -24, -76, -35], dtype=int32)
Or, if you need them in seconds, convert them first to seconds and then to integers:
>>> diff.astype('m8[s]').astype(np.int32)
array([ 4680, 2820, 3300, 1500, 1320, 3480, 6780, 0, -180,
420, 5700, 6240, 600, 4140, 960, 2040, 5220, -120,
4980, 960, 4620, 2880, 600, 1800, 3120, 1860, 2820,
3240, 4980, -1260, 960, 4560, 5100, 3480, 4080, 720,
4440, 60, 4080, 1260, -900, -1620, -360, 60, -180,
2580, -2040, 1920, 2760, -1320, 300, -2880, 960, -1980,
3300, -2220, -1500, -3180, -1260, -2880, 3240, -3060, -1980,
1200, -720, 2880, 840, -2040, 360, -120, -2160, 360,
1200, -4020, -3300, 2580, 1920, -720, 660, 960, 300,
-1860, 2040, 1260, -1200, 660, -4620, -1560, -1080, 60,
-1080, -4080, 360, 1140, -5520, -540, -540, -1560, -2400,
-5880, -2040, -60, -2580, -4920, -3900, -5280, -3120, -1920,
-5040, -3480, -5820, -2940, -780, -4380, -4260, -6900, -4260,
-1440, -4560, -2100], dtype=int32)
Depending on the precision of the data, you may need to use float instead of integer:
diff.astype('m8[s]').astype(np.float32)
source to share