Calculate new column based on values ​​in current and subsequent rows with dplyr in R

I have a large dataset (10+ Mil x 30 vars) and I am trying to compute some new variables based on the complex interactions of the current ones. For clarity, I'm only including important variables in the question. I have the following code in R

, but I am interested in other opinions and opinions. I am using a package dplyr

to calculate new columns based on current / next row values ​​from 3 other columns. (more explanation below code)

I'm wondering if there is a way to make this faster and more efficient, or maybe rewrite it entirely ...

# the main function-data is a dataframe, windowSize and ratio are ints
computeNewColumn <- function(data,windowSize,ratio){

     #helper function used in the second mutate down...
     # all args are ints, i return a boolean out 
     windowAhead <- function(timeTo,window,reduction){

     # subset the original dataframe-only observations with values of
     # TimeToGo between timeTo-1 and window (basically the following X rows 
     # from the current one)
     subframe <- data[(timeTo-1 >= data$TimeToGo & data$TimeToGo >= window), ]
     isthere <- any(subframe$Price < reduction)
     return(isthere)
     }

  # I group by value of ID first and order by TimeToGo...  
  data %<>% group_by(ID) %>% 
  arrange(desc(TimeToGo)) %>%

  # ...create two new columns from simple interactions of existing ones...
  mutate(Window = ifelse(TimeToGo > windowSize, TimeToGo - windowSize, 0),
         Reduction = floor(Price - (ratio * Price))) %>% 
  rowwise() %>%

  #...now comes the more complex stuff- I want to compute a third column 
  # depending on the next (TimeToGo - Window) number of values of Price
  mutate(Advice = ifelse(windowAhead(TimeToGo,Window,Reduction),1,0) ) 

return(data)
}

      

We have a dataset with the following columns: ID, Price, TimeToGo.

First, we group the ID values ​​and calculate two new columns based on the current row values ​​(Window from TimeToGo and Reduction from Price). The next thing we would like to do is calculate a new third column based on

1. Current reduction value

2.the next (Window - TimeToGo) the number of Price values ​​in the data frame.

I'm wondering if there is an easy way to reference the nearest column values ​​from mutate()

? I am ideally looking for a single column sliding window function where the sliding window limits are set from the other two current column values. My solution for now is just using a custom function that manually subsets on the original framework, does the comparison, and returns a value to call mutate()

. Any help and ideas would be much appreciated!

ps here's some sample data ... please let me know if you need more information. Thank!

> a
           ID TimeToGo Price
1 AQSAFOTO30A       96    19
2 AQSAFOTO20A       95    19
3 AQSAFOTO30A       94    17
4 AQSAFOTO20A       93    18
5 AQSAFOTO25A       92    19
6 AQSAFOTO30A       91    17

      

+3


source to share





All Articles