R - get the highest value for each identifier
I have the following df:
>> animals_df:
animal_name age
cat 1
cat 1
cat 2
cat 3
cat 3
dog 1
dog 1
dog 3
dog 4
dog 4
dog 4
horse 1
horse 3
horse 5
horse 5
horse 5
I only want to hunt animals with the highest ages of each species. So I want to get the following output:
animal_name age
cat 3
cat 3
dog 4
dog 4
dog 4
horse 5
horse 5
horse 5
I've tried using:
animals_df = do.call(rbind,lapply(split(animals_df, animals_df$animal_name), function(x) tail(x, 1) ) )
But this will only give one instance of each animal, which will be the following:
animals_name age
cat 3
dog 4
horse 5
+3
source to share
2 answers
It's easy with dplyr
/ tidyverse
:
library(tidyverse)
# How I read your data in, ignore since you already have your data available
df = read.table(file="clipboard", header=TRUE)
df %>%
group_by(animal_name) %>%
filter(age == max(age))
# Output:
Source: local data frame [8 x 2]
Groups: animal_name [3]
animal_name age
<fctr> <int>
1 cat 3
2 cat 3
3 dog 4
4 dog 4
5 dog 4
6 horse 5
7 horse 5
8 horse 5
+4
source to share