Error while subtracting datetime columns in pandas

Question

Error while subtracting datetime columns in pandas

I have the following frame.

  Date Returned Start Date
0    2017-06-02 2017-04-01
1    2017-06-02 2017-04-01
2    2017-06-02 2017-04-01
3    2017-06-02 2017-02-28
4    2017-06-02 2017-02-28
5    2017-06-02 2016-07-20
6    2017-06-02 2016-07-20

Both columns are of type datetime64

.

subframe[['Date Returned','Start Date']].dtypes
Out[9]: 
Date Returned    datetime64[ns]
Start Date       datetime64[ns]
dtype: object

However, when I try to find the timedeltas between two date columns, I get this error.

subframe['Delta']=subframe['Date Returned'] - subframe['Start Date']

TypeError: data type "datetime" not understood

Is there a fix for this? I tried everything I could think of and pulled out most of my hair at this point. Any help is appreciated. I found that someone posted the same problem, but no one answered it.

+3

python pandas datetime timedelta

bemery June 16 17 at 15:32

source to share

2 answers

blacksite · Answer 1 · 2017-08-10T20:41:19+0000

I got the same error in pandas 0.18.1. Here's a workaround, iteratively working on separate start-end pairs:

d['diff'] = [ret - start for start, ret in zip(d['Start'], d['Returned'])]

d

Now:

Returned      Start     diff
0 2017-06-02 2017-04-01  62 days
1 2017-06-02 2017-04-01  62 days
2 2017-06-02 2017-04-01  62 days
3 2017-06-02 2017-02-28  94 days
4 2017-06-02 2017-02-28  94 days
5 2017-06-02 2016-07-20 317 days
6 2017-06-02 2016-07-20 317 days

This workaround is much slower than I would imagine a native pandas implementation would be. Sigh.

Marco spinaci · Answer 2 · 2017-09-14T15:46:29+0000

I think the problem may have been resolved in later versions of pandas (and perhaps appropriately numpy), and it may have always been Windows specific. However, on the computer I'm working on (pandas 0.18.0, numpy 1.13, under Windows 7), it still hasn't been resolved.

For those in the same state as me, there is a workaround that is faster than @blacksite one:

subframe['Delta'] = subframe['Date Returned'].values - subframe['Start Date'].values

Silly as it seems, putting ".values" converts them to Numpy datetime64 objects that subtracts them correctly. By assigning it to the pandas Data Frame column, it will fall back to the Timestamp object again.

In my dataframe (about 90k rows) it takes less than 0.01s (all are used to create a new column in pandas and convert from numpy to a timestamp) whereas another workaround takes about 1.5 seconds.

Error while subtracting datetime columns in pandas

More articles: