Creating new df columns through iteration

I have a dataframe df

that looks like

            Open  High   Low  Close   Volume
Date                                        
2007-03-22  2.65  2.95  2.64   2.86   176389
2007-03-23  2.87  2.87  2.78   2.78    63316
2007-03-26  2.83  2.83  2.51   2.52    54051
2007-03-27  2.61  3.29  2.60   3.28   589443
2007-03-28  3.65  4.10  3.60   3.80  1114659
2007-03-29  3.91  3.91  3.33   3.57   360501
2007-03-30  3.70  3.88  3.66   3.71   185787

      

I am trying to create a new column that will take the df.Open value 5 days in advance from each df.Open value and subtract it.

So, the loop I'm using is this:

for i in range(0, len(df.Open)):  #goes through indexes values
    df['5days'][i]=df.Open[i+5]-df.Open[i]    #I use those index values to locate 

      

However, this loop gives an error.

KeyError: '5days'

I do not know why. I got this to work temporarily by removing df ['5days'] [i], but it seems terribly slow. Not sure if there is a more efficient way to do this.

Thank.

+3


source to share


2 answers


Using diff

df['5Days'] = df.Open.diff(5)
print(df)

            Open  High   Low  Close   Volume  5Days
Date                                               
2007-03-22  2.65  2.95  2.64   2.86   176389    NaN
2007-03-23  2.87  2.87  2.78   2.78    63316    NaN
2007-03-26  2.83  2.83  2.51   2.52    54051    NaN
2007-03-27  2.61  3.29  2.60   3.28   589443    NaN
2007-03-28  3.65  4.10  3.60   3.80  1114659    NaN
2007-03-29  3.91  3.91  3.33   3.57   360501   1.26
2007-03-30  3.70  3.88  3.66   3.71   185787   0.83

      



However, for your code, you can look ahead and align the results. In this case

df['5Days'] = -df.Open.diff(-5)
print(df)

            Open  High   Low  Close   Volume  5days
Date                                               
2007-03-22  2.65  2.95  2.64   2.86   176389   1.26
2007-03-23  2.87  2.87  2.78   2.78    63316   0.83
2007-03-26  2.83  2.83  2.51   2.52    54051    NaN
2007-03-27  2.61  3.29  2.60   3.28   589443    NaN
2007-03-28  3.65  4.10  3.60   3.80  1114659    NaN
2007-03-29  3.91  3.91  3.33   3.57   360501    NaN
2007-03-30  3.70  3.88  3.66   3.71   185787    NaN

      

+4


source


I think you need shift

with sub

:

df['5days'] = df.Open.shift(5).sub(df.Open)
print (df)
            Open  High   Low  Close   Volume  5days
Date                                               
2007-03-22  2.65  2.95  2.64   2.86   176389    NaN
2007-03-23  2.87  2.87  2.78   2.78    63316    NaN
2007-03-26  2.83  2.83  2.51   2.52    54051    NaN
2007-03-27  2.61  3.29  2.60   3.28   589443    NaN
2007-03-28  3.65  4.10  3.60   3.80  1114659    NaN
2007-03-29  3.91  3.91  3.33   3.57   360501  -1.26
2007-03-30  3.70  3.88  3.66   3.71   185787  -0.83

      

Or maybe you need a Open

column-shifted expression :



df['5days'] = df.Open.sub(df.Open.shift(5))
print (df)
            Open  High   Low  Close   Volume  5days
Date                                               
2007-03-22  2.65  2.95  2.64   2.86   176389    NaN
2007-03-23  2.87  2.87  2.78   2.78    63316    NaN
2007-03-26  2.83  2.83  2.51   2.52    54051    NaN
2007-03-27  2.61  3.29  2.60   3.28   589443    NaN
2007-03-28  3.65  4.10  3.60   3.80  1114659    NaN
2007-03-29  3.91  3.91  3.33   3.57   360501   1.26
2007-03-30  3.70  3.88  3.66   3.71   185787   0.83

      


df['5days'] = -df.Open.sub(df.Open.shift(-5))
print (df)
            Open  High   Low  Close   Volume  5days
Date                                               
2007-03-22  2.65  2.95  2.64   2.86   176389   1.26
2007-03-23  2.87  2.87  2.78   2.78    63316   0.83
2007-03-26  2.83  2.83  2.51   2.52    54051    NaN
2007-03-27  2.61  3.29  2.60   3.28   589443    NaN
2007-03-28  3.65  4.10  3.60   3.80  1114659    NaN
2007-03-29  3.91  3.91  3.33   3.57   360501    NaN
2007-03-30  3.70  3.88  3.66   3.71   185787    NaN

      

+4


source







All Articles