Replace rbind in for-loop with foot? (2nd circle of hell)

I am having a problem optimizing a piece of R code. The following example code should illustrate my optimization problem:

Some initializations and function definition:

a <- c(10,20,30,40,50,60,70,80)
b <- c("a","b","c","d","z","g","h","r")
c <- c(1,2,3,4,5,6,7,8)
myframe <- data.frame(a,b,c)
values <- vector(length=columns)
solution <- matrix(nrow=nrow(myframe),ncol=columns+3)

myfunction <- function(frame,columns){
athing = 0
   if(columns == 5){
   athing = 100
   }
   else{
   athing = 1000
   }
value[colums+1] = athing
return(value)}

      

A problematic for-loop looks like this:

columns = 6
for(i in 1:nrow(myframe){
   values <- myfunction(as.matrix(myframe[i,]), columns)
   values[columns+2] = i
   values[columns+3] = myframe[i,3]
   #more columns added with simple operations (i.e. sum)

   solution <- rbind(solution,values)
   #solution is a large matrix from outside the for-loop
}

      

The problem is function rbind

.
I often get error messages about the size solution

, which after a while seems large (over 50 MB). I want to replace this loop with both rbind

a list lapply

and / or a foreach. I started by converting myframe

to a list.

myframe_list <- lapply(seq_len(nrow(myframe)), function(i) myframe[i,])

      

I didn't really get beyond that, although I tried to apply this very good introduction to parallel processing .

How can I restore the resulting loop unchanged myfunction

? Obviously I am open to different solutions ...

Edit: This issue seems to be straight from the second circle of hell from R Inferno . Any suggestions?

+3


source to share


2 answers


The reason it rbind

is bad practice to use in a loop like this is because on each iteration you increment your dataframe solution

and then copy it to a new object, which is a very slow process and can also lead to memory problems. One of the ways is to create a list, the i-th component of which will save the output of the i-th loop iteration. The last step is to call rbind on this list (just once at the end). It will look like



my.list <- vector("list", nrow(myframe))
for(i in 1:nrow(myframe)){
    # Call all necessary commands to create values
    my.list[[i]] <- values
}
solution <- rbind(solution, do.call(rbind, my.list))

      

+6


source


A bit for comment, so I added: If columns

known in advance:

    myfunction <- function(frame){
    athing = 0
       if(columns == 5){
       athing = 100
       }
       else{
       athing = 1000
       }
    value[colums+1] = athing
    return(value)}

    apply(myframe, 2, myfunction)

      



If columns

not provided via the environment, you can use:

apply(myframe, 2, myfunction, columns)

with your original myfunction

definition.

-1


source







All Articles