Delete lines in a data frame that have a specific meaning in the corresponding line of another data frame
I have two data frames.
The first one contains my actual data, lets call it data . The second is an indicator matrix, which is built using the if-else statement, which checks for the presence of a row of values ββcontaining at least 1 or 2, allows you to name its icon .
Here's an example:
col1<-c(1,3,1,3,2)
col2<-c(3,4,2,3,"")
col3<-c(1,3,"","","")
col4<-c(2,"","","","")
data<-data.frame(cbind(col1,col2,col3,col4))
> data
col1 col2 col3 col4
1 3 1 2
3 4 3
1 2
3 3
2
Data rows must contain at least 1 or 2, so here is my function:
remove<-function(x){
if (((x[1] == "1") | (x[1] == "2")) | ((x[2] == "1") | (x[2] == "2"))
| ((x[3] == "1") | (x[3] == "2")) | ((x[4] == "1") | (x[4] == "2"))){
return(0)
}
else{
return(1)
}
}
indic<-data.frame(apply(data,1,remove))
> indic
y
1 0
2 1
3 0
4 1
5 0
From looking at the data, row 2 and row 4 do not contain at least 1 or 2, which is confirmed by the indicator .
I would like to delete lines 2 and 4 in the data that match lines 2 and 4 in the sign . I've already tried the following:
finalMatrix<-class(array)
for(i in 1:nrow(indic)){
if (indic[i,1] == "1"){
finalMatrix = data[-i,]
}
else{
data[i,] = data[i,]
}
}
However, my output looks something like this:
> finalMatrix
col1 col2 col3 col4
1 3 1 2
3 4 3
1 2
2
This effectively eliminates the fourth line ONLY. I think it might be because after each iteration I have to create a new dataframe, but then the problem is that the length of the iteration changes.
Wondering if I'm on the right track with my code ... any suggestions would be great. I thought about this several times.
-Soph
source to share
You can try to create vet TRUE / FALSE instead of your pointer vector which contains 0/1. This makes the final filtering more obvious.
> data
col1 col2 col3 col4
1 1 3 1 2
2 3 4 3
3 1 2
4 3 3
5 2
Using any
will give you easy access to the contents of the string 1
or 2
. The second any
will tell you if one of the two conditions was met. apply()
traverses all lines if the second parameter is set to 1.
indic <- apply(data, 1, function(row) {
any(c(any(row == 1), any(row == 2)))
})
> indic
[1] TRUE FALSE TRUE FALSE TRUE
> data[indic,]
col1 col2 col3 col4
1 1 3 1 2
3 1 2
5 2
As the title of your question suggests, a pointer vector can also be applied to another dataframe, but here it is important to make sure that the dataframe and the pointer vector are the same size or target to recycle the vector.
Bump up @nicola's suggestion for using vectorization.
data[rowSums(data=="1" | data=="2")>0,]
This would make the most efficient gentle cycles work and create indic
. Although the TRUE / FALSE vector emitted from rowSums(data=="1" | data=="2")>0
can be stored in a variable.
source to share