Calculate new column based on values ββin current and subsequent rows with dplyr in R
I have a large dataset (10+ Mil x 30 vars) and I am trying to compute some new variables based on the complex interactions of the current ones. For clarity, I'm only including important variables in the question. I have the following code in R
, but I am interested in other opinions and opinions. I am using a package dplyr
to calculate new columns based on current / next row values ββfrom 3 other columns. (more explanation below code)
I'm wondering if there is a way to make this faster and more efficient, or maybe rewrite it entirely ...
# the main function-data is a dataframe, windowSize and ratio are ints
computeNewColumn <- function(data,windowSize,ratio){
#helper function used in the second mutate down...
# all args are ints, i return a boolean out
windowAhead <- function(timeTo,window,reduction){
# subset the original dataframe-only observations with values of
# TimeToGo between timeTo-1 and window (basically the following X rows
# from the current one)
subframe <- data[(timeTo-1 >= data$TimeToGo & data$TimeToGo >= window), ]
isthere <- any(subframe$Price < reduction)
return(isthere)
}
# I group by value of ID first and order by TimeToGo...
data %<>% group_by(ID) %>%
arrange(desc(TimeToGo)) %>%
# ...create two new columns from simple interactions of existing ones...
mutate(Window = ifelse(TimeToGo > windowSize, TimeToGo - windowSize, 0),
Reduction = floor(Price - (ratio * Price))) %>%
rowwise() %>%
#...now comes the more complex stuff- I want to compute a third column
# depending on the next (TimeToGo - Window) number of values of Price
mutate(Advice = ifelse(windowAhead(TimeToGo,Window,Reduction),1,0) )
return(data)
}
We have a dataset with the following columns: ID, Price, TimeToGo.
First, we group the ID values ββand calculate two new columns based on the current row values ββ(Window from TimeToGo and Reduction from Price). The next thing we would like to do is calculate a new third column based on
1. Current reduction value
2.the next (Window - TimeToGo) the number of Price values ββin the data frame.
I'm wondering if there is an easy way to reference the nearest column values ββfrom mutate()
? I am ideally looking for a single column sliding window function where the sliding window limits are set from the other two current column values. My solution for now is just using a custom function that manually subsets on the original framework, does the comparison, and returns a value to call mutate()
. Any help and ideas would be much appreciated!
ps here's some sample data ... please let me know if you need more information. Thank!
> a
ID TimeToGo Price
1 AQSAFOTO30A 96 19
2 AQSAFOTO20A 95 19
3 AQSAFOTO30A 94 17
4 AQSAFOTO20A 93 18
5 AQSAFOTO25A 92 19
6 AQSAFOTO30A 91 17
source to share
No one has answered this question yet
See similar questions:
or similar: