Expand the data frame from the minimum value to the maximum value of each column

The reproduced data comprise random values below 2 covariates ( cov1

and cov2

) 2 animals ( Cat

and Dog

) and 2 season ( Summer

and Winter

).

library(dplyr); library(tidyr)
set.seed(123)
dat <- data.frame(Season = rep(c("Summer", "Winter"), each = 100),
                  Species = rep(c("Cat", "Dog", "Cat", "Dog"), each = 50),
                  cov1 = sample(1:100, 200, replace = TRUE),
                  cov2 = sample(1:100, 200, replace = TRUE))

head(dat)
  Season Species cov1 cov2
1 Summer     Cat   29   24
2 Summer     Cat   79   97
3 Summer     Cat   41   61
4 Summer     Cat   89   52
5 Summer     Cat   95   41
6 Summer     Cat    5   89

      

I want to create a new df that contains a sequence from minimum to maximum value for each Season / Species combination. My initial thought was to use it first dplyr

to determine the min and max values.

RangeDat <- dat %>% group_by(Season, Species) %>% 
  summarise_each(funs(min, max)) %>%
  as.data.frame()

> RangeDat
  Season Species cov1_min cov2_min cov1_max cov2_max
1 Summer     Cat        3        5      100       97
2 Summer     Dog        1        1       99       99
3 Winter     Cat        2        1       99      100
4 Winter     Dog       12        2       99      100

      

From here I am not sure how to extend df. Ideally, the df result would have 4 columns (Season, Species, cov1, cov2). Values ​​for cov1

and cov2

will range from minimum to maximum for each Season / Species combination. As with the original dat

df, the values ​​for Season

and Species

repeat df for increasing values ​​of cov1

and cov2

.

Regarding comments, can an NA value be included where the Species / Season combination length is less than the "maximum" range?

Any suggestions are greatly appreciated!

+3


source to share


1 answer


We can summarise

inlist

library(dplyr)
dat %>%
    group_by(Season, Species) %>% 
    summarise(cov1 = list(min(cov1):max(cov1)), cov2 = list(min(cov2):max(cov2)))

      


or using data.table

library(data.table)
setDT(dat)[, .(cov1 = list(min(cov1):max(cov1)),
               cov2 = list(min(cov2):max(cov2))), by = .(Season, Species)]

      



Update

Since the OP mentioned keeping length

the same padding path with NA

, one option with dplyr

would be

f1 <- function(x1, x2){
         x1 <- min(x1):max(x1)
          x2 <- min(x2):max(x2)
          m1 <- max(c(length(x1), length(x2)))
          length(x1) <- m1
          length(x2) <- m1
          list(cov1 = x1, cov2 = x2)
         }

dat %>%
    group_by(Season, Species) %>% 
    do(data.frame(Season = .$Season[1], Species = .$Species[1],  f1(.$cov1, .$cov2)))
# A tibble: 396 x 4
# Groups:   Season, Species [4]
#   Season Species  cov1  cov2
#   <fctr>  <fctr> <int> <int>
# 1 Summer     Cat     3     5
# 2 Summer     Cat     4     6
# 3 Summer     Cat     5     7
# 4 Summer     Cat     6     8
# 5 Summer     Cat     7     9
# 6 Summer     Cat     8    10
# 7 Summer     Cat     9    11
# 8 Summer     Cat    10    12
# 9 Summer     Cat    11    13
#10 Summer     Cat    12    14
# ... with 386 more rows

      

and a possible extension with the help data.table

will be

setDT(dat)[, f1(cov1, cov2), .(Season, Species)]
#     Season Species cov1 cov2
#  1: Summer     Cat    3    5
#  2: Summer     Cat    4    6
#  3: Summer     Cat    5    7
#  4: Summer     Cat    6    8
#  5: Summer     Cat    7    9
# ---                         
#392: Winter     Dog   NA   96
#393: Winter     Dog   NA   97
#394: Winter     Dog   NA   98
#395: Winter     Dog   NA   99
#396: Winter     Dog   NA  100

      

+5


source







All Articles