R: dataframe Select maximum row per id based on first timestamp

Question

R: dataframe Select maximum row per id based on first timestamp

I have a data frame that contains timestamped records. The toy example below contains an ID with two SMS attached to it based on two different time stamps. There would actually be thousands of IDs each with almost 80-100 SMS types and dates

toydf <- data.frame(ID = c(1045937900, 1045937900), 
                    SMS.Type = c("DF1", "WCB14"), 
                    SMS.Date = c("12/02/2015 19:51", "13/02/2015 08:38"))

I want to be able to create a new framework that only contains an SMS type record for the first SMS.Date or even the last

I looked into using duplicated , I also thought about sorting the date column in descending order by ID and adding a new column that puts 1 next to the first instance of the ID and zero if the current ID is equal to the previous ID. I suspect it will get heavy if the number of entries increases dramatically

Does anyone know of a more elegant way to do this - perhaps using data.table

thank you for your time

+3

r distinct-values

John smith May 25 '15 at 12:48

source to share

1 answer

akrun · Accepted Answer · 2015-05-25T12:58:41+0000

Try

library(dplyr)
toydf %>% 
   group_by(ID) %>%
   arrange(desc(as.POSIXct(SMS.Date, format='%d/%m/%Y %H:%M'))) %>% 
   slice(1L)

Or using data.table

library(data.table)
toydf$SMS.Date <- as.POSIXct(toydf$SMS.Date, format='%d/%m/%Y %H:%M')
setkey(setDT(toydf), ID, SMS.Date)[, .SD[.N], ID]

R: dataframe Select maximum row per id based on first timestamp

More articles: