R: Uniques (or dplyr distinct) + most recent date

I have a dataframe made up of lines of information that includes Name-based repeats from different dates. I would like to filter this df into one that only includes unique names, but also select the most recent event if given the chance. I'm a big fan of dplyr and have used combinations of different and choose before, but the documentation gives the impression that it can't be done with this:

"Variables used in determining uniqueness. If there are multiple rows for a given combination of inputs, only the first row will be saved."

This seems like a problem that usually occurs, so I was wondering if anyone has any advice. Below is an example df that reflects my real data has names as generic class and Date as POSIXct, which I generated with lubridate package.

structure(list(Name = c("John", "John", "Mary", "John", "Mary", 
"Chad"), Date = structure(c(1430438400, 1433116800, 1335830400, 
1422748800, 1435708800, 1427846400), tzone = "UTC", class = c("POSIXct", 
"POSIXt"))), .Names = c("Name", "Date"), row.names = c(NA, -6L
), class = "data.frame")

      

Desired output:

structure(list(Name = c("John", "Mary", "Chad"), Date = structure(c(1433116800, 
1435708800, 1427846400), class = c("POSIXct", "POSIXt"), tzone = "UTC")), .Names = c("Name", 
"Date"), row.names = c(2L, 5L, 6L), class = "data.frame")

      

Thank you for your help.

+3


source to share


1 answer


The easiest way -

DF %>% arrange(desc(Date)) %>% distinct(Name)

      



If you really want the names to be kept in the same order, they also work (thanks to @akrun):

DF %>% group_by(Name) %>% slice(which.max(Date))  # @akrun better idea
DF %>% group_by(Name) %>% filter(Date==max(Date)) # my idea

      

+7


source







All Articles