Pandas: Difference of two datetime64 objects gives NaT instead of correct timedelta value
This question is "asked a lot" - but after looking closely at the other answers, I haven't found a solution that works in my case. It's a shame, it's still such a point.
I have a pandas
dataframe with a column datetime
and I just want to calculate the time range covered by the data in seconds (say).
from datetime import datetime
# You can create fake datetime entries any way you like, e.g.
df = pd.DataFrame({'datetime': pd.date_range('10/1/2001 10:00:00', \
periods=3, freq='10H'),'B':[4,5,6]})
# (a) This yields NaT:
timespan_a=df['datetime'][-1:]-df['datetime'][:1]
print timespan_a
# 0 NaT
# 2 NaT
# Name: datetime, dtype: timedelta64[ns]
# (b) This does work - but why?
timespan_b=df['datetime'][-1:].values.astype("timedelta64")-\
df['datetime'][:1].values.astype("timedelta64")
print timespan_b
# [72000000000000]
-
Why doesn't it work?
-
Why is (b) required sooner? (it also gives a singleton
numpy
array, not an objecttimedelta
)
My pandas is in a version 0.20.3
that excludes a previously known bug.
Is this a dynamic range issue?
source to share
There is a problem with different indices, so one row of items cannot be aligned and retrieved NaT
.
The solution is converting the first or second values ββto a numpy array with values
:
timespan_a = df['datetime'][-1:]-df['datetime'][:1].values
print (timespan_a)
2 20:00:00
Name: datetime, dtype: timedelta64[ns]
Or set both index values ββequal:
a = df['datetime'][-1:]
b = df['datetime'][:1]
print (a)
2 2001-10-02 06:00:00
Name: datetime, dtype: datetime64[ns]
a.index = b.index
print (a)
0 2001-10-02 06:00:00
Name: datetime, dtype: datetime64[ns]
print (b)
0 2001-10-01 10:00:00
Name: datetime, dtype: datetime64[ns]
timespan_a = a - b
print (timespan_a)
0 20:00:00
Name: datetime, dtype: timedelta64[ns]
If you want to work with scalars:
a = df.loc[df.index[-1], 'datetime']
b = df.loc[0, 'datetime']
print (a)
2001-10-02 06:00:00
print (b)
2001-10-01 10:00:00
timespan_a = a - b
print (timespan_a)
0 days 20:00:00
Another solution, thanks to Anton vBR :
timespan_a = df.get_value(len(df)-1,'datetime')- df.get_value(0,'datetime')
print (timespan_a)
0 days 20:00:00
source to share