R + dplyr filtration time
I have some data that looks at a group of people and the fruits they eat over time. I want to use dplyr to look at each individual person until they eat a banana and sum up all the fruits they ate until they eat their first banana.
Data:
data <- structure(list(user = c(1234L, 1234L, 1234L, 1234L, 1234L, 1234L,
1234L, 1234L, 1234L, 1234L, 1234L, 1234L, 9584L, 9584L, 9584L,
9584L, 9584L, 9584L, 9584L, 9584L, 9584L, 4758L, 4758L, 4758L,
4758L, 4758L, 4758L), site = structure(c(1L, 6L, 1L, 1L, 6L,
5L, 5L, 3L, 4L, 1L, 2L, 6L, 1L, 6L, 5L, 5L, 3L, 2L, 6L, 6L, 6L,
4L, 2L, 5L, 5L, 4L, 2L), .Label = c("apple", "banana", "lemon",
"lime", "orange", "pear"), class = "factor"), time = c(1L, 2L,
3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L,
6L, 7L, 8L, 9L, 5L, 6L, 7L, 8L, 9L, 10L), int = structure(c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L), .Label = c("banana",
"other"), class = "factor")), .Names = c("user", "site", "time",
"int"), row.names = c(NA, -27L), class = "data.frame")
My initial thought was to group the data to find the first instance of each banana user:
data <- data %>% transform(var = ifelse(site=="banana", 'banana','other'))
data_ban <- data %>%
filter(var=='banana') %>%
group_by(user, var, time) %>%
group_by(user) %>%
summarise(first_banana = min(time))
But now I am stuck on how to actually apply this back to the original "datafile" and set a filter that says: For each user, only include data until the time specified in "data_ban". Any ideas?
source to share
Something like this: the grouping user
and filtering is time
lower than the first time they eat a banana.
> data %>% group_by(user) %>% filter( time <= which(site=="banana")[1] )
Source: local data frame [17 x 4]
Groups: user
user site time int
1 1234 apple 1 other
2 1234 pear 2 other
3 1234 apple 3 other
4 1234 apple 4 other
5 1234 pear 5 other
6 1234 orange 6 other
7 1234 orange 7 other
8 1234 lemon 8 other
9 1234 lime 9 other
10 1234 apple 10 other
11 1234 banana 11 banana
12 9584 apple 1 other
13 9584 pear 2 other
14 9584 orange 3 other
15 9584 orange 4 other
16 9584 lemon 5 other
17 9584 banana 6 banana
Otherwise it is possible anti_join
.
source to share