Pandas Diff () for first records in timers, no data returns NaN

In Pandas 0.14.1, diff () does not generate values ​​at the start of time series.

Using diff () seems to handle missing data differently than cumsum (), which assumes NaN == 0. I am wondering if there is a way to make diff () accept 0 for previous missing data (missing as it is from before time series).

For example:

    >print df

    2014-05-01  A     Apple        1
                B     Banana       2
    2014-06-01  A     Apple        3
                B     Banana       4

      

leads to:

    >print df.groupby(level=[1,2]).diff()

    2014-05-01  A     Apple        NaN
                B     Banana       NaN
    2014-06-01  A     Apple        2
                B     Banana       2

      

When the desired result is:

    2014-05-01  A     Apple        1
                B     Banana       2
    2014-06-01  A     Apple        2
                B     Banana       2

      

+3


source to share


1 answer


As far as I know, groupby(...).diff()

just a call np.diff

that always returns an array 1 (or n) shorter than the one passed to it.

But filling in the missing data should be pretty simple. Something like that?



In [175]: df
Out[175]: 
                     d
a          b c        
2014-05-01 A Apple   1
           B Banana  2
2014-06-01 A Apple   3
           B Banana  4

In [176]: df['diff'] = df.groupby(level=[1,2])['d'].diff()

In [177]: df['diff'] = df['diff'].fillna(df['d'])

In [178]: df
Out[178]: 
                     d  diff
a          b c              
2014-05-01 A Apple   1     1
           B Banana  2     2
2014-06-01 A Apple   3     2
           B Banana  4     2

      

+5


source







All Articles