Delete lines in a data frame that have a specific meaning in the corresponding line of another data frame

I have two data frames.

The first one contains my actual data, lets call it data . The second is an indicator matrix, which is built using the if-else statement, which checks for the presence of a row of values ​​containing at least 1 or 2, allows you to name its icon .

Here's an example:

col1<-c(1,3,1,3,2)
col2<-c(3,4,2,3,"")
col3<-c(1,3,"","","")
col4<-c(2,"","","","")

data<-data.frame(cbind(col1,col2,col3,col4))

> data
  col1 col2 col3 col4
     1    3    1    2
     3    4    3     
     1    2          
     3    3          
     2   

      

Data rows must contain at least 1 or 2, so here is my function:

remove<-function(x){

  if (((x[1] == "1") | (x[1] == "2")) | ((x[2] == "1") | (x[2] == "2"))
      | ((x[3] == "1") | (x[3] == "2")) | ((x[4] == "1") | (x[4] == "2"))){
    return(0)
  }

else{
  return(1)
}
}

indic<-data.frame(apply(data,1,remove))

> indic
        y
1       0
2       1
3       0
4       1
5       0

      

From looking at the data, row 2 and row 4 do not contain at least 1 or 2, which is confirmed by the indicator .

I would like to delete lines 2 and 4 in the data that match lines 2 and 4 in the sign . I've already tried the following:

finalMatrix<-class(array)

for(i in 1:nrow(indic)){
  if (indic[i,1] == "1"){
    finalMatrix = data[-i,]
  }
  else{
    data[i,] = data[i,]
  }
}

      

However, my output looks something like this:

> finalMatrix
  col1 col2 col3 col4
    1    3    1    2
    3    4    3     
    1    2                  
    2    

      

This effectively eliminates the fourth line ONLY. I think it might be because after each iteration I have to create a new dataframe, but then the problem is that the length of the iteration changes.

Wondering if I'm on the right track with my code ... any suggestions would be great. I thought about this several times.

-Soph

+3


source to share


1 answer


You can try to create vet TRUE / FALSE instead of your pointer vector which contains 0/1. This makes the final filtering more obvious.

> data
  col1 col2 col3 col4
1    1    3    1    2
2    3    4    3     
3    1    2          
4    3    3          
5    2        

      

Using any

will give you easy access to the contents of the string 1

or 2

. The second any

will tell you if one of the two conditions was met. apply()

traverses all lines if the second parameter is set to 1.

indic <- apply(data, 1, function(row) {
    any(c(any(row == 1), any(row == 2)))
})


> indic
[1]  TRUE FALSE  TRUE FALSE  TRUE

> data[indic,]
  col1 col2 col3 col4
1    1    3    1    2
3    1    2          
5    2   

      



As the title of your question suggests, a pointer vector can also be applied to another dataframe, but here it is important to make sure that the dataframe and the pointer vector are the same size or target to recycle the vector.

Bump up @nicola's suggestion for using vectorization.

data[rowSums(data=="1" | data=="2")>0,]

      

This would make the most efficient gentle cycles work and create indic

. Although the TRUE / FALSE vector emitted from rowSums(data=="1" | data=="2")>0

can be stored in a variable.

+1


source







All Articles