A subset data frame only includes the levels of one factor that have values at both levels of the other factor

Question

A subset data frame only includes the levels of one factor that have values at both levels of the other factor

I am working with a data frame that handles numeric dimensions. Some people have been measured multiple times, both as minors and adults. Reproducible example:

ID <- c("a1", "a2", "a3", "a4", "a1", "a2", "a5", "a6", "a1", "a3")
age <- rep(c("juvenile", "adult"), each=5)
size <- rnorm(10)

# e.g. a1 is measured 3 times, twice as a juvenile, once as an adult.
d <- data.frame(ID, age, size)

My goal is a subset of this dataframe, choosing IDs that appear at least once as minors and at least once as adults. Not sure how to do this.?

The resulting information frame will contain all measurements for individuals a1, a2, and a3, but exclude a4, a5, and a6 as they were not measured in both stages.

A similar question was asked 7 months ago but did not have an answer ( Subset data frame including only one level that matters at both levels of another factor )

Thank!

+3

r dataframe subset level

Mehdi.K 07 jul. 17 at 1:48

source to share

3 answers

With help dplyr

you can use group_by %>% filter

:

library(dplyr)
d %>% group_by(ID) %>% filter(all(c("juvenile", "adult") %in% age))

# A tibble: 7 x 3
# Groups:   ID [3]
#      ID      age       size
#  <fctr>   <fctr>      <dbl>
#1     a1 juvenile -0.6947697
#2     a2 juvenile -0.3665272
#3     a3 juvenile  1.0293555
#4     a1 juvenile  0.2745224
#5     a2    adult  0.5299029
#6     a1    adult  2.2247802
#7     a3    adult -0.4717160

+4

Psidom 07 jul. 17 at 1:53

source to share

split

on age

, intersect

and a subset:

d[d$ID %in% Reduce(intersect, split(d$ID, d$age)),]
#   ID      age        size
#1  a1 juvenile  1.44761836
#2  a2 juvenile  1.70098645
#3  a3 juvenile  0.08231986
#5  a1 juvenile  0.91240568
#6  a2    adult -1.77318962
#9  a1    adult  0.13597986
#10 a3    adult -1.18575294

+4

thelatemail 07 jul. 17 at 3:02

source to share

akrun · Accepted Answer · 2017-07-07T04:44:05+0000

Here's one of the options: data.table

library(data.table)
setDT(d)[, .SD[all(c("juvenile", "adult") %in% age)], ID]

Or a parameter base R

withave

d[with(d, ave(as.character(age), ID, FUN = function(x) length(unique(x)))>1),]
#   ID      age       size
#1  a1 juvenile -1.4545407
#2  a2 juvenile -0.4695317
#3  a3 juvenile  0.2271316
#5  a1 juvenile  0.2961210
#6  a2    adult -0.8331993
#9  a1    adult -0.6924967
#10 a3    adult -0.4619550

A subset data frame only includes the levels of one factor that have values ​​at both levels of the other factor

More articles:

A subset data frame only includes the levels of one factor that have values at both levels of the other factor