A subset data frame only includes the levels of one factor that have values ββat both levels of the other factor
I am working with a data frame that handles numeric dimensions. Some people have been measured multiple times, both as minors and adults. Reproducible example:
ID <- c("a1", "a2", "a3", "a4", "a1", "a2", "a5", "a6", "a1", "a3")
age <- rep(c("juvenile", "adult"), each=5)
size <- rnorm(10)
# e.g. a1 is measured 3 times, twice as a juvenile, once as an adult.
d <- data.frame(ID, age, size)
My goal is a subset of this dataframe, choosing IDs that appear at least once as minors and at least once as adults. Not sure how to do this.?
The resulting information frame will contain all measurements for individuals a1, a2, and a3, but exclude a4, a5, and a6 as they were not measured in both stages.
A similar question was asked 7 months ago but did not have an answer ( Subset data frame including only one level that matters at both levels of another factor )
Thank!
source to share
Here's one of the options: data.table
library(data.table)
setDT(d)[, .SD[all(c("juvenile", "adult") %in% age)], ID]
Or a parameter base R
withave
d[with(d, ave(as.character(age), ID, FUN = function(x) length(unique(x)))>1),]
# ID age size
#1 a1 juvenile -1.4545407
#2 a2 juvenile -0.4695317
#3 a3 juvenile 0.2271316
#5 a1 juvenile 0.2961210
#6 a2 adult -0.8331993
#9 a1 adult -0.6924967
#10 a3 adult -0.4619550
source to share
With help dplyr
you can use group_by %>% filter
:
library(dplyr)
d %>% group_by(ID) %>% filter(all(c("juvenile", "adult") %in% age))
# A tibble: 7 x 3
# Groups: ID [3]
# ID age size
# <fctr> <fctr> <dbl>
#1 a1 juvenile -0.6947697
#2 a2 juvenile -0.3665272
#3 a3 juvenile 1.0293555
#4 a1 juvenile 0.2745224
#5 a2 adult 0.5299029
#6 a1 adult 2.2247802
#7 a3 adult -0.4717160
source to share