A subset data frame only includes the levels of one factor that have values ​​at both levels of the other factor

I am working with a data frame that handles numeric dimensions. Some people have been measured multiple times, both as minors and adults. Reproducible example:

ID <- c("a1", "a2", "a3", "a4", "a1", "a2", "a5", "a6", "a1", "a3")
age <- rep(c("juvenile", "adult"), each=5)
size <- rnorm(10)

# e.g. a1 is measured 3 times, twice as a juvenile, once as an adult.
d <- data.frame(ID, age, size)

      

My goal is a subset of this dataframe, choosing IDs that appear at least once as minors and at least once as adults. Not sure how to do this.?

The resulting information frame will contain all measurements for individuals a1, a2, and a3, but exclude a4, a5, and a6 as they were not measured in both stages.

A similar question was asked 7 months ago but did not have an answer ( Subset data frame including only one level that matters at both levels of another factor )

Thank!

+3


source to share


3 answers


Here's one of the options: data.table

library(data.table)
setDT(d)[, .SD[all(c("juvenile", "adult") %in% age)], ID]

      




Or a parameter base R

withave

d[with(d, ave(as.character(age), ID, FUN = function(x) length(unique(x)))>1),]
#   ID      age       size
#1  a1 juvenile -1.4545407
#2  a2 juvenile -0.4695317
#3  a3 juvenile  0.2271316
#5  a1 juvenile  0.2961210
#6  a2    adult -0.8331993
#9  a1    adult -0.6924967
#10 a3    adult -0.4619550

      

+3


source


With help dplyr

you can use group_by %>% filter

:



library(dplyr)
d %>% group_by(ID) %>% filter(all(c("juvenile", "adult") %in% age))

# A tibble: 7 x 3
# Groups:   ID [3]
#      ID      age       size
#  <fctr>   <fctr>      <dbl>
#1     a1 juvenile -0.6947697
#2     a2 juvenile -0.3665272
#3     a3 juvenile  1.0293555
#4     a1 juvenile  0.2745224
#5     a2    adult  0.5299029
#6     a1    adult  2.2247802
#7     a3    adult -0.4717160

      

+4


source


split

on age

, intersect

and a subset:

d[d$ID %in% Reduce(intersect, split(d$ID, d$age)),]
#   ID      age        size
#1  a1 juvenile  1.44761836
#2  a2 juvenile  1.70098645
#3  a3 juvenile  0.08231986
#5  a1 juvenile  0.91240568
#6  a2    adult -1.77318962
#9  a1    adult  0.13597986
#10 a3    adult -1.18575294

      

+4


source







All Articles