Loop to summarize observation more than subject in R

Question

Loop to summarize observation more than subject in R

I have a dataset that looks like this:

set.seed(100)
da <- data.frame(exp = c(rep("A", 4), rep("B", 4)), diam = runif(8, 10, 30))

For each row in the dataset, I want to sum the observations (diam) that are larger than the diameter in a particular row and are included in the "exp" level. For this, I made a loop:

da$d2 <- 0
for (i in 1:length(da$exp)){
 for (j in 1:length(da$exp)){
  if (da$diam[i] < da$diam[j] & da$exp[i] == da$exp[j]){
    da$d2[i] = da$d2[i] + da$diam[j]}
}
}

Lopp works great and I got results

  exp     diam       d2
1   A 16.15532 21.04645
2   A 15.15345 37.20177
3   A 21.04645  0.00000
4   A 11.12766 52.35522
5   B 19.37099 45.92347
6   B 19.67541 26.24805
7   B 26.24805  0.00000
8   B 17.40641 65.29445

However, my real dataset is much larger (> 40,000 rows and> 100 exp levels), so the loop is very slow. Hope some function can be used to make the calculations easier.

+3

loops r

Mateusz1981 May 20 '15 at 9:36

source to share

1 answer

docendo discimus · Accepted Answer · 2015-05-20T09:45:01+0000

If you don't want the initial ordering in the result, you can do it quite efficiently:

library(data.table)
setorder(setDT(da), exp, -diam)
da[, d2 := cumsum(diam) - diam, by = exp]

da
#   exp     diam       d2
#1:   A 21.04645  0.00000
#2:   A 16.15532 21.04645
#3:   A 15.15345 37.20177
#4:   A 11.12766 52.35522
#5:   B 26.24805  0.00000
#6:   B 19.67541 26.24805
#7:   B 19.37099 45.92347
#8:   B 17.40641 65.29445

Using dplyr this would be:

library(dplyr)
da %>%
  arrange(exp, desc(diam)) %>%
  group_by(exp) %>%
  mutate(d2 = cumsum(diam) - diam)

Loop to summarize observation more than subject in R

More articles: