Error while subtracting datetime columns in pandas

I have the following frame.

  Date Returned Start Date
0    2017-06-02 2017-04-01
1    2017-06-02 2017-04-01
2    2017-06-02 2017-04-01
3    2017-06-02 2017-02-28
4    2017-06-02 2017-02-28
5    2017-06-02 2016-07-20
6    2017-06-02 2016-07-20

      

Both columns are of type datetime64

.

subframe[['Date Returned','Start Date']].dtypes
Out[9]: 
Date Returned    datetime64[ns]
Start Date       datetime64[ns]
dtype: object

      

However, when I try to find the timedeltas between two date columns, I get this error.

subframe['Delta']=subframe['Date Returned'] - subframe['Start Date']

TypeError: data type "datetime" not understood 

      

Is there a fix for this? I tried everything I could think of and pulled out most of my hair at this point. Any help is appreciated. I found that someone posted the same problem, but no one answered it.

+3


source to share


2 answers


I got the same error in pandas 0.18.1. Here's a workaround, iteratively working on separate start-end pairs:

d['diff'] = [ret - start for start, ret in zip(d['Start'], d['Returned'])]

      

d

Now:



Returned      Start     diff
0 2017-06-02 2017-04-01  62 days
1 2017-06-02 2017-04-01  62 days
2 2017-06-02 2017-04-01  62 days
3 2017-06-02 2017-02-28  94 days
4 2017-06-02 2017-02-28  94 days
5 2017-06-02 2016-07-20 317 days
6 2017-06-02 2016-07-20 317 days

      

This workaround is much slower than I would imagine a native pandas implementation would be. Sigh.

+2


source


I think the problem may have been resolved in later versions of pandas (and perhaps appropriately numpy), and it may have always been Windows specific. However, on the computer I'm working on (pandas 0.18.0, numpy 1.13, under Windows 7), it still hasn't been resolved.

For those in the same state as me, there is a workaround that is faster than @blacksite one:

subframe['Delta'] = subframe['Date Returned'].values - subframe['Start Date'].values

      



Silly as it seems, putting ".values" converts them to Numpy datetime64 objects that subtracts them correctly. By assigning it to the pandas Data Frame column, it will fall back to the Timestamp object again.

In my dataframe (about 90k rows) it takes less than 0.01s (all are used to create a new column in pandas and convert from numpy to a timestamp) whereas another workaround takes about 1.5 seconds.

+2


source







All Articles