Fall of the factor level for which there is one missing value for one column r

Question

Fall of the factor level for which there is one missing value for one column r

I would like to remove any factor level value for which one row contains a missing value

Example:

ID var1 var2
1  1    2
1  NA   3
2  1    2
2  2    4

So, in this hypothetical, what would be left:

ID var1 var2
2  1    2
2  2    4

+3

r

gh0strider18 Dec 18 14 at 19:47

source to share

4 answers

David Arenburg · Answer 1 · 2014-12-18T19:59:44+0000

There is a possible solution data.table

(sorry @rawr)

library(data.table)
setDT(df)[, if (all(!is.na(.SD))) .SD, ID]
#    ID var1 var2
# 1:  2    1    2
# 2:  2    2    4

If you only want to check var1

, then

df[, if (all(!is.na(var1))) .SD, ID]
#    ID var1 var2
# 1:  2    1    2
# 2:  2    2    4

akrun · Answer 2 · 2014-12-18T19:54:08+0000

Assuming that NAs

will occur in columns var

,

 df[with(df, !ave(!!rowSums(is.na(df[,-1])), ID, FUN=any)),]
 #   ID var1 var2
 #3  2    1    2
 #4  2    2    4

Or, if this only applies to var1

 df[with(df, !ave(is.na(var1), ID, FUN=any)),]
 #  ID var1 var2
 #3  2    1    2
 #4  2    2    4

Or using dplyr

 library(dplyr)
 df %>% 
     group_by(ID) %>%
     filter(all(!is.na(var1)))
 #   ID var1 var2
 #1  2    1    2
 #2  2    2    4

data

 df <- structure(list(ID = c(1L, 1L, 2L, 2L), var1 = c(1L, NA, 1L, 2L
 ), var2 = c(2L, 3L, 2L, 4L)), .Names = c("ID", "var1", "var2"
 ), class = "data.frame", row.names = c(NA, -4L))

docendo discimus · Answer 3 · 2014-12-18T20:19:41+0000

Here's another parameter in base R. It will check all columns for NA.

df[!df$ID %in% df$ID[rowSums(is.na(df)) > 0],]
#  ID var1 var2
#3  2    1    2
#4  2    2    4

If you only want to check the "var1" column, you can do:

df[!with(df, ID %in% ID[is.na(var1)]),]
#  ID var1 var2
#3  2    1    2
#4  2    2    4

Arun · Answer 4 · 2014-12-19T16:08:15+0000

In the current development release, data.table

there is a new implementation na.omit

for data.tables that takes arguments cols =

and invert =

.

cols =

allows you to specify the columns to search NAs

. And it invert = TRUE

returns NA strings instead of skipping them.

You can install the devel version by following these instructions . Or you can wait for 1.9.6 on CRAN at some point. Using this, we can do:

require(data.table) ## 1.9.5+
setkey(setDT(df), ID)
df[!na.omit(df, invert = TRUE)]
#    ID var1 var2
# 1:  2    1    2
# 2:  2    2    4

How it works:

setDT

converts data.frame to data.table by reference.
setkey

sorts the data table by the provided columns and puts those columns as key columns so that we can perform the join.
na.omit(df, invert = TRUE)

gives only those lines that have NA

anywhere.
X[!Y]

performs an anit-join by joining the key column ID

and returns all rows that do not match ID = 1

(from Y

). Check this post to read about data.table connections in detail.

NTN

Fall of the factor level for which there is one missing value for one column r

data

More articles: