How can I remove duplicate rows based on a different column state?
2 answers
You can try something like the following using "dplyr":
library(dplyr)
data %>% ## Your data
group_by(a) %>% ## grouped by "a"
filter(b == max(b)) ## filtered to only include the rows where b == max(b)
# Source: local data frame [2 x 3]
# Groups: a
#
# a b d
# 1 1 1 1
# 2 3 6 7
Note, however, that if there are more rows that match b == max(b)
, they will also be returned. So an alternative could be:
data %>% ## Your data
group_by(a) %>% ## grouped by "a"
arrange(desc(b)) %>% ## sorted by descending values of "b"
slice(1) ## with just the first row extracted
+6
source to share
Options using data.table
:
library(data.table)
setDT(data)[, .SD[which.max(b)], a]
# a b d
#1: 1 1 1
#2: 3 6 7
Or using .I
to get the index of the row (which will be slightly faster)
setDT(data)[data[, .I[which.max(b)], a]$V1]
# a b d
#1: 1 1 1
#2: 3 6 7
or
setkey(setDT(data), a,b)[,.SD[.N], a]
# a b d
#1: 1 1 1
#2: 3 6 7
If there are relationships for maximum values
setDT(data)[, .SD[max(b)==b], a]
# a b d
#1: 1 1 1
#2: 3 6 7
+3
source to share