Replacement for-loops is applied to improve performance (with weighted.mean)

Question

Replacement for-loops is applied to improve performance (with weighted.mean)

I'm an R newbie, so hopefully this is a solvable problem for some of you. I have a dataframe containing over a million data points. My goal is to compute a weighted average with a varying starting point.

To illustrate this frame (data.frame (matrix (c (1,2,3,2,2,1), 3,2)))

where X1 is data and X2 is sample weight.

I want to calculate a weighted average for X1 from a starting point 1 to 3, 2: 3, and 3: 3.

With a loop, I just wrote:

B <- rep(NA,3) #empty result vector
for(i in 1:3){
  B[i] <- weighted.mean(x=A$X1[i:3],w=A$X2[i:3]) #shifting the starting point of the data and weights further to the end
}

With my real data this is impossible to compute because for each iteration the data.frame changes and the computation takes hours with no result.

Is there a way to implement a starting point for the variation using the apply command to improve performance?

Best regards, Ruben

+3

for-loop r apply weighted-average

Ruben 07 Mar At 19:56

source to share

2 answers

You can use lapply

to create your subsets and sapply

to iterate over them, but I would bet in a faster way.

sapply(lapply(1:3,":",3),function(x) with(dat[x,],weighted.mean(X1,X2)))
[1] 1.800000 2.333333 3.000000

+1

James 07 Mar 12 at 20:41

source to share

Tommy · Accepted Answer · 2012-03-07T21:53:08+0000

Building on @ joran's answer to get the correct result:

with(A, rev(cumsum(rev(X1*X2)) / cumsum(rev(X2))))
# [1] 1.800000 2.333333 3.000000

Also note that this is much faster than the sapply

/ approach lapply

.

Replacement for-loops is applied to improve performance (with weighted.mean)

More articles: