Expand the data frame from the minimum value to the maximum value of each column
The reproduced data comprise random values below 2 covariates ( cov1
and cov2
) 2 animals ( Cat
and Dog
) and 2 season ( Summer
and Winter
).
library(dplyr); library(tidyr)
set.seed(123)
dat <- data.frame(Season = rep(c("Summer", "Winter"), each = 100),
Species = rep(c("Cat", "Dog", "Cat", "Dog"), each = 50),
cov1 = sample(1:100, 200, replace = TRUE),
cov2 = sample(1:100, 200, replace = TRUE))
head(dat)
Season Species cov1 cov2
1 Summer Cat 29 24
2 Summer Cat 79 97
3 Summer Cat 41 61
4 Summer Cat 89 52
5 Summer Cat 95 41
6 Summer Cat 5 89
I want to create a new df that contains a sequence from minimum to maximum value for each Season / Species combination. My initial thought was to use it first dplyr
to determine the min and max values.
RangeDat <- dat %>% group_by(Season, Species) %>%
summarise_each(funs(min, max)) %>%
as.data.frame()
> RangeDat
Season Species cov1_min cov2_min cov1_max cov2_max
1 Summer Cat 3 5 100 97
2 Summer Dog 1 1 99 99
3 Winter Cat 2 1 99 100
4 Winter Dog 12 2 99 100
From here I am not sure how to extend df. Ideally, the df result would have 4 columns (Season, Species, cov1, cov2). Values ββfor cov1
and cov2
will range from minimum to maximum for each Season / Species combination. As with the original dat
df, the values ββfor Season
and Species
repeat df for increasing values ββof cov1
and cov2
.
Regarding comments, can an NA value be included where the Species / Season combination length is less than the "maximum" range?
Any suggestions are greatly appreciated!
source to share
We can summarise
inlist
library(dplyr)
dat %>%
group_by(Season, Species) %>%
summarise(cov1 = list(min(cov1):max(cov1)), cov2 = list(min(cov2):max(cov2)))
or using data.table
library(data.table)
setDT(dat)[, .(cov1 = list(min(cov1):max(cov1)),
cov2 = list(min(cov2):max(cov2))), by = .(Season, Species)]
Update
Since the OP mentioned keeping length
the same padding path with NA
, one option with dplyr
would be
f1 <- function(x1, x2){
x1 <- min(x1):max(x1)
x2 <- min(x2):max(x2)
m1 <- max(c(length(x1), length(x2)))
length(x1) <- m1
length(x2) <- m1
list(cov1 = x1, cov2 = x2)
}
dat %>%
group_by(Season, Species) %>%
do(data.frame(Season = .$Season[1], Species = .$Species[1], f1(.$cov1, .$cov2)))
# A tibble: 396 x 4
# Groups: Season, Species [4]
# Season Species cov1 cov2
# <fctr> <fctr> <int> <int>
# 1 Summer Cat 3 5
# 2 Summer Cat 4 6
# 3 Summer Cat 5 7
# 4 Summer Cat 6 8
# 5 Summer Cat 7 9
# 6 Summer Cat 8 10
# 7 Summer Cat 9 11
# 8 Summer Cat 10 12
# 9 Summer Cat 11 13
#10 Summer Cat 12 14
# ... with 386 more rows
and a possible extension with the help data.table
will be
setDT(dat)[, f1(cov1, cov2), .(Season, Species)]
# Season Species cov1 cov2
# 1: Summer Cat 3 5
# 2: Summer Cat 4 6
# 3: Summer Cat 5 7
# 4: Summer Cat 6 8
# 5: Summer Cat 7 9
# ---
#392: Winter Dog NA 96
#393: Winter Dog NA 97
#394: Winter Dog NA 98
#395: Winter Dog NA 99
#396: Winter Dog NA 100
source to share