R + dplyr filtration time

I have some data that looks at a group of people and the fruits they eat over time. I want to use dplyr to look at each individual person until they eat a banana and sum up all the fruits they ate until they eat their first banana.

Data:

data <-  structure(list(user = c(1234L, 1234L, 1234L, 1234L, 1234L, 1234L, 
    1234L, 1234L, 1234L, 1234L, 1234L, 1234L, 9584L, 9584L, 9584L, 
    9584L, 9584L, 9584L, 9584L, 9584L, 9584L, 4758L, 4758L, 4758L, 
    4758L, 4758L, 4758L), site = structure(c(1L, 6L, 1L, 1L, 6L, 
    5L, 5L, 3L, 4L, 1L, 2L, 6L, 1L, 6L, 5L, 5L, 3L, 2L, 6L, 6L, 6L, 
    4L, 2L, 5L, 5L, 4L, 2L), .Label = c("apple", "banana", "lemon", 
    "lime", "orange", "pear"), class = "factor"), time = c(1L, 2L, 
    3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 
    6L, 7L, 8L, 9L, 5L, 6L, 7L, 8L, 9L, 10L), int = structure(c(2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 
    1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L), .Label = c("banana", 
    "other"), class = "factor")), .Names = c("user", "site", "time", 
    "int"), row.names = c(NA, -27L), class = "data.frame")

      

My initial thought was to group the data to find the first instance of each banana user:

data <- data %>% transform(var = ifelse(site=="banana", 'banana','other'))

data_ban <- data %>% 
    filter(var=='banana') %>% 
    group_by(user, var, time) %>%
    group_by(user) %>%
    summarise(first_banana = min(time))

      

But now I am stuck on how to actually apply this back to the original "datafile" and set a filter that says: For each user, only include data until the time specified in "data_ban". Any ideas?

+3


source to share


2 answers


You may try slice



data %>%
     group_by(user) %>% 
     slice(1:(which(int=='banana')[1L]))

      

+4


source


Something like this: the grouping user

and filtering is time

lower than the first time they eat a banana.

> data %>% group_by(user) %>% filter( time <= which(site=="banana")[1] )
Source: local data frame [17 x 4]
Groups: user

   user   site time    int
1  1234  apple    1  other
2  1234   pear    2  other
3  1234  apple    3  other
4  1234  apple    4  other
5  1234   pear    5  other
6  1234 orange    6  other
7  1234 orange    7  other
8  1234  lemon    8  other
9  1234   lime    9  other
10 1234  apple   10  other
11 1234 banana   11 banana
12 9584  apple    1  other
13 9584   pear    2  other
14 9584 orange    3  other
15 9584 orange    4  other
16 9584  lemon    5  other
17 9584 banana    6 banana

      



Otherwise it is possible anti_join

.

+2


source







All Articles