Avoiding the for loop in R

Suppose I have two different datasets Data1

and Data2

. For each record in, Data1$Incidents

I want to find the rows in Data2$Incidents

that match it, and also keep track of the records that don't have a match. Then I save the records that correspond to the new data frame Data1_Matches

. Now for each record in Data2$Incidents

I search for records in Data1_Matches$Incidents

that match and then create a similar dataframe Data2_Matches

.

Suppose for an argument, my datasets look like this:

Day    Incidents
"Monday"    30
"Friday"    11
"Sunday"    27

      

At the moment my algorithm looks like this:

Data1_Incs = as.integer(Data1$Incidents)
LEN1     = length(Data1_Incs)
No_Match = 0

for (k in 1:LEN1){
  Incs = which(Data2$Incidents == Data1_Incs[k])
  if (length(Incs) == 0){
    No_Match = c(No_Match,k)
  }
}
No_Match = No_Match[-1]

Data1_Match    <- Data1[-No_Match,]
Data1_No_Match <- Data1[ No_Match,]

Data2_Incs = Data2$Incidents
LEN2       = length(Data2_Incs)
Un_Match   = 0

for (j in 1:LEN2){
  Incs = which(as.integer(Data1_Match$Incidents) == Data2_Incs[j])
  if (length(Incs) == 0){
    Un_Match = c(Un_Match, j)
  }
}
Un_Match = Un_Match[-1]

Data2_Match    <- Data2[-Un_Match,]
Data2_No_Match <- Data2[ Un_Match,]

      

What's the best way for me to accomplish this task without using a for loop? For reference, it Data1

has about 15,000 entries, and Data2

- closer to two million.

+3


source to share


1 answer


Try to use setdiff

.

I'll demonstrate in the first loop:



No_Match <- setdiff(unique(Data2$Incidents), unique(Data1$Incidents))

      

Not sure if this will satisfy your requirements.

+3


source







All Articles