R: Uniques (or dplyr distinct) + most recent date
I have a dataframe made up of lines of information that includes Name-based repeats from different dates. I would like to filter this df into one that only includes unique names, but also select the most recent event if given the chance. I'm a big fan of dplyr and have used combinations of different and choose before, but the documentation gives the impression that it can't be done with this:
"Variables used in determining uniqueness. If there are multiple rows for a given combination of inputs, only the first row will be saved."
This seems like a problem that usually occurs, so I was wondering if anyone has any advice. Below is an example df that reflects my real data has names as generic class and Date as POSIXct, which I generated with lubridate package.
structure(list(Name = c("John", "John", "Mary", "John", "Mary",
"Chad"), Date = structure(c(1430438400, 1433116800, 1335830400,
1422748800, 1435708800, 1427846400), tzone = "UTC", class = c("POSIXct",
"POSIXt"))), .Names = c("Name", "Date"), row.names = c(NA, -6L
), class = "data.frame")
Desired output:
structure(list(Name = c("John", "Mary", "Chad"), Date = structure(c(1433116800,
1435708800, 1427846400), class = c("POSIXct", "POSIXt"), tzone = "UTC")), .Names = c("Name",
"Date"), row.names = c(2L, 5L, 6L), class = "data.frame")
Thank you for your help.
source to share
The easiest way -
DF %>% arrange(desc(Date)) %>% distinct(Name)
If you really want the names to be kept in the same order, they also work (thanks to @akrun):
DF %>% group_by(Name) %>% slice(which.max(Date)) # @akrun better idea
DF %>% group_by(Name) %>% filter(Date==max(Date)) # my idea
source to share