R: how to remove specific lines in data.frame

> data = data.frame(a = c(100, -99, 322, 155, 256), b = c(23, 11, 25, 25, -999))
> data
    a    b
1 100   23
2 -99   11
3 322   25
4 155   25
5 256 -999

      

For a data.frame like this, I would like to remove any line that contains -99 or -999. Therefore my resulting datafile should only consist of lines 1, 3 and 4.

I was thinking about writing a loop for this, but I hope there will be an easier way. (If my data.frame had az columns then the loop method would be very clunky). My loop would probably look something like this.

i = 1
for(i in 1:nrow(data)){
  if(data$a[i] < 0){
    data = data[-i,]
  }else if(data$b[i] < 0){
    data = data[-i,]
  }else data = data
}

      

+3


source to share


3 answers


Perhaps it:



ind <- Reduce(`|`,lapply(data,function(x) x %in% c(-99,-999)))
> data[!ind,]
    a  b
1 100 23
3 322 25
4 155 25

      

+4


source


 data [ rowSums(data == -99 | data==-999) == 0 , ]
    a  b
1 100 23
3 322 25
4 155 25

      



Both "==" and "|" (OR) acts on dataframes like matrices, returning booleans of the same dimensions as rowSums can succeed.

+6


source


@rawr's comment probably makes sense to do this during import. However, you can do this if you already have data:

na.omit(replace(data, sapply(data,`%in%`,c(-99,-999)), NA))
#    a  b
#1 100 23
#3 322 25
#4 155 25

      

+1


source







All Articles