Error while subtracting datetime columns in pandas
I have the following frame.
Date Returned Start Date
0 2017-06-02 2017-04-01
1 2017-06-02 2017-04-01
2 2017-06-02 2017-04-01
3 2017-06-02 2017-02-28
4 2017-06-02 2017-02-28
5 2017-06-02 2016-07-20
6 2017-06-02 2016-07-20
Both columns are of type datetime64
.
subframe[['Date Returned','Start Date']].dtypes
Out[9]:
Date Returned datetime64[ns]
Start Date datetime64[ns]
dtype: object
However, when I try to find the timedeltas between two date columns, I get this error.
subframe['Delta']=subframe['Date Returned'] - subframe['Start Date']
TypeError: data type "datetime" not understood
Is there a fix for this? I tried everything I could think of and pulled out most of my hair at this point. Any help is appreciated. I found that someone posted the same problem, but no one answered it.
source to share
I got the same error in pandas 0.18.1. Here's a workaround, iteratively working on separate start-end pairs:
d['diff'] = [ret - start for start, ret in zip(d['Start'], d['Returned'])]
d
Now:
Returned Start diff
0 2017-06-02 2017-04-01 62 days
1 2017-06-02 2017-04-01 62 days
2 2017-06-02 2017-04-01 62 days
3 2017-06-02 2017-02-28 94 days
4 2017-06-02 2017-02-28 94 days
5 2017-06-02 2016-07-20 317 days
6 2017-06-02 2016-07-20 317 days
This workaround is much slower than I would imagine a native pandas implementation would be. Sigh.
source to share
I think the problem may have been resolved in later versions of pandas (and perhaps appropriately numpy), and it may have always been Windows specific. However, on the computer I'm working on (pandas 0.18.0, numpy 1.13, under Windows 7), it still hasn't been resolved.
For those in the same state as me, there is a workaround that is faster than @blacksite one:
subframe['Delta'] = subframe['Date Returned'].values - subframe['Start Date'].values
Silly as it seems, putting ".values" converts them to Numpy datetime64 objects that subtracts them correctly. By assigning it to the pandas Data Frame column, it will fall back to the Timestamp object again.
In my dataframe (about 90k rows) it takes less than 0.01s (all are used to create a new column in pandas and convert from numpy to a timestamp) whereas another workaround takes about 1.5 seconds.
source to share