Replace rbind in for-loop with foot? (2nd circle of hell)
I am having a problem optimizing a piece of R code. The following example code should illustrate my optimization problem:
Some initializations and function definition:
a <- c(10,20,30,40,50,60,70,80)
b <- c("a","b","c","d","z","g","h","r")
c <- c(1,2,3,4,5,6,7,8)
myframe <- data.frame(a,b,c)
values <- vector(length=columns)
solution <- matrix(nrow=nrow(myframe),ncol=columns+3)
myfunction <- function(frame,columns){
athing = 0
if(columns == 5){
athing = 100
}
else{
athing = 1000
}
value[colums+1] = athing
return(value)}
A problematic for-loop looks like this:
columns = 6
for(i in 1:nrow(myframe){
values <- myfunction(as.matrix(myframe[i,]), columns)
values[columns+2] = i
values[columns+3] = myframe[i,3]
#more columns added with simple operations (i.e. sum)
solution <- rbind(solution,values)
#solution is a large matrix from outside the for-loop
}
The problem is function rbind
. I often get error messages about the size solution
, which after a while seems large (over 50 MB). I want to replace this loop with both rbind
a list lapply
and / or a foreach. I started by converting myframe
to a list.
myframe_list <- lapply(seq_len(nrow(myframe)), function(i) myframe[i,])
I didn't really get beyond that, although I tried to apply this very good introduction to parallel processing .
How can I restore the resulting loop unchanged myfunction
? Obviously I am open to different solutions ...
Edit: This issue seems to be straight from the second circle of hell from R Inferno . Any suggestions?
source to share
The reason it rbind
is bad practice to use in a loop like this is because on each iteration you increment your dataframe solution
and then copy it to a new object, which is a very slow process and can also lead to memory problems. One of the ways is to create a list, the i-th component of which will save the output of the i-th loop iteration. The last step is to call rbind on this list (just once at the end). It will look like
my.list <- vector("list", nrow(myframe))
for(i in 1:nrow(myframe)){
# Call all necessary commands to create values
my.list[[i]] <- values
}
solution <- rbind(solution, do.call(rbind, my.list))
source to share
A bit for comment, so I added: If columns
known in advance:
myfunction <- function(frame){
athing = 0
if(columns == 5){
athing = 100
}
else{
athing = 1000
}
value[colums+1] = athing
return(value)}
apply(myframe, 2, myfunction)
If columns
not provided via the environment, you can use:
apply(myframe, 2, myfunction, columns)
with your original myfunction
definition.
source to share