Loop to remove duplicates in many R trials
I have a dataset (called eyeData strong>) that, in a very short version, looks like this:
sNumber runningTrialNo wordTar
1 1 vital
1 1 raccoon
1 1 vital
1 1 accumulates
1 2 tornado
1 2 destroys
1 2 tornado
1 2 destroys
1 2 property
4 51 denounces
4 51 brings
4 51 illegible
4 51 frequently
4 51 brings
4 61 cerebrum
4 61 vital
4 61 knowledge
4 61 vital
4 61 cerebrum
I wrote a loop to remove all duplicates (same words) of the wordTar column for each test separately, so the data would look like this:
sNumber runningTrialNo wordTar
1 1 vital
1 1 raccoon
1 1 accumulates
1 2 tornado
1 2 destroys
1 2 property
4 51 denounces
4 51 brings
4 51 illegible
4 51 frequently
4 61 cerebrum
4 61 vital
4 61 knowledge
4 61 cerebrum
Here's the code:
for (sno in eyeData$sNumber) {
for(trial in eyeData$runningTrialNo) {
ss <- subset(eyeData, sNumber == sno & runningTrialNo == trial)
ss.s <- ss[!duplicated(ss$wordTar), ]
}
}
However, it works for a very long time, so I close it ... since I am new to the R environment, I am assuming that I am doing something wrong with the loop. Is there a way to improve my loop so that it gives me the desired result?
source to share
For loops in general it is slower in R. Usually you want to vectorize your code . There are many ways to do this, here is an example of using the library dplyr
:
library(dplyr)
eyeData %>% group_by(runningTrialNo) %>%
distinct(wordTar)
This is much faster, we can see using microbenchmark
where we run the code 100 times and see how long it takes:
library(microbenchmark)
microbenchmark(dplyr = eyeData %>% group_by(runningTrialNo) %>%
distinct(wordTar),
old = for (sno in eyeData$sNumber) {
for(trial in eyeData$runningTrialNo) {
ss <- subset(eyeData, sNumber == sno & runningTrialNo == trial)
ss.s <- ss[!duplicated(ss$wordTar), ]
}
})
Unit: milliseconds
expr min lq mean median uq max neval
dplyr 1.256438 1.287158 1.567518 1.495092 1.550579 12.29212 100
old 102.203029 110.265423 112.664063 111.789698 113.166710 304.58312 100
source to share