Fall of the factor level for which there is one missing value for one column r

I would like to remove any factor level value for which one row contains a missing value

Example:

ID var1 var2
1  1    2
1  NA   3
2  1    2
2  2    4

      

So, in this hypothetical, what would be left:

ID var1 var2
2  1    2
2  2    4

      

+3


source to share


4 answers


There is a possible solution data.table

(sorry @rawr)

library(data.table)
setDT(df)[, if (all(!is.na(.SD))) .SD, ID]
#    ID var1 var2
# 1:  2    1    2
# 2:  2    2    4

      



If you only want to check var1

, then

df[, if (all(!is.na(var1))) .SD, ID]
#    ID var1 var2
# 1:  2    1    2
# 2:  2    2    4

      

+4


source


Assuming that NAs

will occur in columns var

,

 df[with(df, !ave(!!rowSums(is.na(df[,-1])), ID, FUN=any)),]
 #   ID var1 var2
 #3  2    1    2
 #4  2    2    4

      

Or, if this only applies to var1

 df[with(df, !ave(is.na(var1), ID, FUN=any)),]
 #  ID var1 var2
 #3  2    1    2
 #4  2    2    4

      



Or using dplyr

 library(dplyr)
 df %>% 
     group_by(ID) %>%
     filter(all(!is.na(var1)))
 #   ID var1 var2
 #1  2    1    2
 #2  2    2    4

      

data

 df <- structure(list(ID = c(1L, 1L, 2L, 2L), var1 = c(1L, NA, 1L, 2L
 ), var2 = c(2L, 3L, 2L, 4L)), .Names = c("ID", "var1", "var2"
 ), class = "data.frame", row.names = c(NA, -4L))

      

+3


source


Here's another parameter in base R. It will check all columns for NA.

df[!df$ID %in% df$ID[rowSums(is.na(df)) > 0],]
#  ID var1 var2
#3  2    1    2
#4  2    2    4

      

If you only want to check the "var1" column, you can do:

df[!with(df, ID %in% ID[is.na(var1)]),]
#  ID var1 var2
#3  2    1    2
#4  2    2    4

      

+3


source


In the current development release, data.table

there is a new implementation na.omit

for data.tables that takes arguments cols =

and invert =

.

cols =

allows you to specify the columns to search NAs

. And it invert = TRUE

returns NA strings instead of skipping them.

You can install the devel version by following these instructions . Or you can wait for 1.9.6 on CRAN at some point. Using this, we can do:

require(data.table) ## 1.9.5+
setkey(setDT(df), ID)
df[!na.omit(df, invert = TRUE)]
#    ID var1 var2
# 1:  2    1    2
# 2:  2    2    4

      


How it works:

  • setDT

    converts data.frame to data.table by reference.

  • setkey

    sorts the data table by the provided columns and puts those columns as key columns so that we can perform the join.

  • na.omit(df, invert = TRUE)

    gives only those lines that have NA

    anywhere.

  • X[!Y]

    performs an anit-join by joining the key column ID

    and returns all rows that do not match ID = 1

    (from Y

    ). Check this post to read about data.table connections in detail.

NTN

+2


source







All Articles