Expand the data frame from the minimum value to the maximum value of each column

Question

Expand the data frame from the minimum value to the maximum value of each column

The reproduced data comprise random values below 2 covariates ( cov1

and cov2

) 2 animals ( Cat

and Dog

) and 2 season ( Summer

and Winter

).

library(dplyr); library(tidyr)
set.seed(123)
dat <- data.frame(Season = rep(c("Summer", "Winter"), each = 100),
                  Species = rep(c("Cat", "Dog", "Cat", "Dog"), each = 50),
                  cov1 = sample(1:100, 200, replace = TRUE),
                  cov2 = sample(1:100, 200, replace = TRUE))

head(dat)
  Season Species cov1 cov2
1 Summer     Cat   29   24
2 Summer     Cat   79   97
3 Summer     Cat   41   61
4 Summer     Cat   89   52
5 Summer     Cat   95   41
6 Summer     Cat    5   89

I want to create a new df that contains a sequence from minimum to maximum value for each Season / Species combination. My initial thought was to use it first dplyr

to determine the min and max values.

RangeDat <- dat %>% group_by(Season, Species) %>% 
  summarise_each(funs(min, max)) %>%
  as.data.frame()

> RangeDat
  Season Species cov1_min cov2_min cov1_max cov2_max
1 Summer     Cat        3        5      100       97
2 Summer     Dog        1        1       99       99
3 Winter     Cat        2        1       99      100
4 Winter     Dog       12        2       99      100

From here I am not sure how to extend df. Ideally, the df result would have 4 columns (Season, Species, cov1, cov2). Values for cov1

and cov2

will range from minimum to maximum for each Season / Species combination. As with the original dat

df, the values for Season

and Species

repeat df for increasing values of cov1

and cov2

.

Regarding comments, can an NA value be included where the Species / Season combination length is less than the "maximum" range?

Any suggestions are greatly appreciated!

+3

r dplyr

B. Davis 03 jul. 17 at 1:03

source to share

1 answer

akrun · Accepted Answer · 2017-07-03T01:09:22+0000

We can summarise

inlist

library(dplyr)
dat %>%
    group_by(Season, Species) %>% 
    summarise(cov1 = list(min(cov1):max(cov1)), cov2 = list(min(cov2):max(cov2)))

or using data.table

library(data.table)
setDT(dat)[, .(cov1 = list(min(cov1):max(cov1)),
               cov2 = list(min(cov2):max(cov2))), by = .(Season, Species)]

Update

Since the OP mentioned keeping length

the same padding path with NA

, one option with dplyr

would be

f1 <- function(x1, x2){
         x1 <- min(x1):max(x1)
          x2 <- min(x2):max(x2)
          m1 <- max(c(length(x1), length(x2)))
          length(x1) <- m1
          length(x2) <- m1
          list(cov1 = x1, cov2 = x2)
         }

dat %>%
    group_by(Season, Species) %>% 
    do(data.frame(Season = .$Season[1], Species = .$Species[1],  f1(.$cov1, .$cov2)))
# A tibble: 396 x 4
# Groups:   Season, Species [4]
#   Season Species  cov1  cov2
#   <fctr>  <fctr> <int> <int>
# 1 Summer     Cat     3     5
# 2 Summer     Cat     4     6
# 3 Summer     Cat     5     7
# 4 Summer     Cat     6     8
# 5 Summer     Cat     7     9
# 6 Summer     Cat     8    10
# 7 Summer     Cat     9    11
# 8 Summer     Cat    10    12
# 9 Summer     Cat    11    13
#10 Summer     Cat    12    14
# ... with 386 more rows

and a possible extension with the help data.table

will be

setDT(dat)[, f1(cov1, cov2), .(Season, Species)]
#     Season Species cov1 cov2
#  1: Summer     Cat    3    5
#  2: Summer     Cat    4    6
#  3: Summer     Cat    5    7
#  4: Summer     Cat    6    8
#  5: Summer     Cat    7    9
# ---                         
#392: Winter     Dog   NA   96
#393: Winter     Dog   NA   97
#394: Winter     Dog   NA   98
#395: Winter     Dog   NA   99
#396: Winter     Dog   NA  100

Expand the data frame from the minimum value to the maximum value of each column

Update

More articles: