Fall of the factor level for which there is one missing value for one column r
Assuming that NAs
will occur in columns var
,
df[with(df, !ave(!!rowSums(is.na(df[,-1])), ID, FUN=any)),]
# ID var1 var2
#3 2 1 2
#4 2 2 4
Or, if this only applies to var1
df[with(df, !ave(is.na(var1), ID, FUN=any)),]
# ID var1 var2
#3 2 1 2
#4 2 2 4
Or using dplyr
library(dplyr)
df %>%
group_by(ID) %>%
filter(all(!is.na(var1)))
# ID var1 var2
#1 2 1 2
#2 2 2 4
data
df <- structure(list(ID = c(1L, 1L, 2L, 2L), var1 = c(1L, NA, 1L, 2L
), var2 = c(2L, 3L, 2L, 4L)), .Names = c("ID", "var1", "var2"
), class = "data.frame", row.names = c(NA, -4L))
source to share
In the current development release, data.table
there is a new implementation na.omit
for data.tables that takes arguments cols =
and invert =
.
cols =
allows you to specify the columns to search NAs
. And it invert = TRUE
returns NA strings instead of skipping them.
You can install the devel version by following these instructions . Or you can wait for 1.9.6 on CRAN at some point. Using this, we can do:
require(data.table) ## 1.9.5+
setkey(setDT(df), ID)
df[!na.omit(df, invert = TRUE)]
# ID var1 var2
# 1: 2 1 2
# 2: 2 2 4
How it works:
-
setDT
converts data.frame to data.table by reference. -
setkey
sorts the data table by the provided columns and puts those columns as key columns so that we can perform the join. -
na.omit(df, invert = TRUE)
gives only those lines that haveNA
anywhere. -
X[!Y]
performs an anit-join by joining the key columnID
and returns all rows that do not matchID = 1
(fromY
). Check this post to read about data.table connections in detail.
NTN
source to share