How do I calculate the column that depends on the function that uses the value of the variable of each row?
This is a layout based on
what I would like to do:
- calculate a column that counts the number of cars that have less offset (
) of the current row within the same transmission type
- the expected column are the values I would like to get
- one try with a function
, the problem is that I can't count it across the category dependent subsets (
I've tried solutions with
, but somehow I could never get the function called to only work on a subset that depends on the value of the variable of the string being processed (hopefully this makes sense).
x = mtcars[1:6,c("disp","am")] # expected values are the number of cars that have less disp while having the same am x$expected = c(1,1,0,1,2,0) #this ordered table is for findInterval a = x[order(x$disp),] a # I use the findInterval function to get the number of values and I try subsetting the call # -0.1 is to deal with the closed intervalq x$try1 = findInterval(x$disp-0.1, a$disp[a$am==x$am]) x # try1 values are not computed depending on the subsetting of a
Any decision will be followed; using the function
I would prefer to have a more general solution to calculate the value of a column by calling a function that takes values from the current row to calculate the expected value.
source to share
As @dimitris_ps pointed out, the previous solution neglects duplicate values. As a consequence, a remedy is provided.
library(dplyr) x %>% group_by(am) %>% mutate(expected=findInterval(disp, sort(disp) + 0.0001))
library(data.table) setDT(x)[, expected:=findInterval(disp, sort(disp) + 0.0001), by=am]
source to share