Loop to remove duplicates in many R trials

Question

Loop to remove duplicates in many R trials

I have a dataset (called eyeData strong>) that, in a very short version, looks like this:

sNumber runningTrialNo  wordTar                             
1       1               vital       
1       1               raccoon                             
1       1               vital                               
1       1               accumulates                             
1       2               tornado                             
1       2               destroys                                
1       2               tornado                             
1       2               destroys                                
1       2               property                                
4       51              denounces                               
4       51              brings                              
4       51              illegible                               
4       51              frequently                              
4       51              brings                          
4       61              cerebrum
4       61              vital
4       61              knowledge
4       61              vital
4       61              cerebrum

I wrote a loop to remove all duplicates (same words) of the wordTar column for each test separately, so the data would look like this:

   sNumber  runningTrialNo  wordTar                             
1           1               vital       
1           1               raccoon                         
1           1               accumulates                             
1           2               tornado                             
1           2               destroys                                
1           2               property                                
4           51              denounces                               
4           51              brings                              
4           51              illegible                               
4           51              frequently                  
4           61              cerebrum
4           61              vital
4           61              knowledge
4           61              cerebrum

Here's the code:

for (sno in eyeData$sNumber) {
for(trial in eyeData$runningTrialNo) {
ss <- subset(eyeData, sNumber == sno & runningTrialNo == trial)
ss.s <- ss[!duplicated(ss$wordTar), ]
 }
}

However, it works for a very long time, so I close it ... since I am new to the R environment, I am assuming that I am doing something wrong with the loop. Is there a way to improve my loop so that it gives me the desired result?

+3

loops r duplicates

MariKo 03 jul. 15 at 18:51

source to share

1 answer

jeremycg · Answer 1 · 2015-07-03T18:58:31+0000

For loops in general it is slower in R. Usually you want to vectorize your code . There are many ways to do this, here is an example of using the library dplyr

:

library(dplyr)
eyeData %>% group_by(runningTrialNo) %>%
            distinct(wordTar)

This is much faster, we can see using microbenchmark

where we run the code 100 times and see how long it takes:

library(microbenchmark)

microbenchmark(dplyr = eyeData %>% group_by(runningTrialNo) %>%
                   distinct(wordTar), 
               old = for (sno in eyeData$sNumber) {
                       for(trial in eyeData$runningTrialNo) {
                           ss <- subset(eyeData, sNumber == sno & runningTrialNo == trial)
                           ss.s <- ss[!duplicated(ss$wordTar), ]
                       }
                   })

Unit: milliseconds
  expr        min         lq       mean     median         uq       max neval
 dplyr   1.256438   1.287158   1.567518   1.495092   1.550579  12.29212   100
   old 102.203029 110.265423 112.664063 111.789698 113.166710 304.58312   100

Loop to remove duplicates in many R trials

More articles: