Max / Min of date on Pandas column, columns include nan values

I am trying to create a new column in pandas framework with the maximum (or minimum) date from the other two date columns. But, when there is a NAN in any of these columns, the entire min / max column becomes NAN. What gives? When using numeric columns, this works great ... but with dates, the new column is all NAN. Here's some sample code to illustrate the problem:

df = pd.DataFrame(data=[[np.nan,date(2000,11,1)], 
                        [date(2000,12,1), date(2000,9,1)],
                        [date(2000,4,1),np.nan],
                        [date(2000,12,2),np.nan]], columns=['col1','col2'])

df['col3'] = df[['col1','col2']].max(axis=1)

      

I know it can be done with loc and a combination of <,>, isnull and so on. But how to make it work with regular max / min functions?

+3


source to share


1 answer


You keep objects date

in your columns, if you convert to datetime

then it works as expected:

In[10]:
df['col1'] = pd.to_datetime(df['col1'])
df['col2'] = pd.to_datetime(df['col2'])
df

Out[10]: 
        col1       col2  col3
0        NaT 2000-11-01   NaN
1 2000-12-01 2000-09-01   NaN
2 2000-04-01        NaT   NaN
3 2000-12-02        NaT   NaN

In[11]:
df['col3'] = df[['col1','col2']].max(axis=1)
df

Out[11]: 
        col1       col2       col3
0        NaT 2000-11-01 2000-11-01
1 2000-12-01 2000-09-01 2000-12-01
2 2000-04-01        NaT 2000-04-01
3 2000-12-02        NaT 2000-12-02

      

If you just did:



df['col3'] = df['col1'].max()

      

this calls a TypeError: '>=' not supported between instances of 'float' and 'datetime.date'

The values ​​are NaN

forced dtype

to rise to float

, so it returns NaN

. If you didn't have missing values ​​then it would work as expected, if you have missing values ​​then you have to convert dtype

to datetime

so the missing values ​​are converted to NaT

so that max

works correctly

+3


source







All Articles