Na.locf with group_by from dplyr
I am trying to use na.locf
from a batch of zoo
grouped data using dplyr
. I am using the first solution for this question: Using dplyr window functions to create final values (fill in the NA values)
library(dplyr);library(zoo)
df1 <- data.frame(id=rep(c("A","B"),each=3),problem=c(1,NA,2,NA,NA,NA),ok=c(NA,3,4,5,6,NA))
df1
id problem ok
1 A 1 NA
2 A NA 3
3 A 2 4
4 B NA 5
5 B NA 6
6 B NA NA
The problem arises when all data within a group is NA. As you can see from the problem column, the data na.locf
for id = B comes from a different group: the latest data for id = A.
df1 %>% group_by(id) %>% na.locf()
Source: local data frame [6 x 3]
Groups: id [2]
id problem ok
<chr> <chr> <chr>
1 A 1 <NA>
2 A 1 3
3 A 2 4
4 B 2 5 #problem col is wrong
5 B 2 6 #problem col is wrong
6 B 2 6 #problem col is wrong
This is my expected result. The data for id = B is independent of what is in id = A
id problem ok
<chr> <chr> <chr>
1 A 1 <NA>
2 A 1 3
3 A 2 4
4 B NA 5
5 B NA 6
6 B NA 6
source to share
We need to use it na.locf
internally mutate_all
as it na.locf
can be applied directly to the dataset. Even though it is grouped by "id", applying na.locf
by applying to the full dataset does not match any grouping behavior
df1 %>%
group_by(id) %>%
mutate_all(funs(na.locf(., na.rm = FALSE)))
# id problem ok
# <fctr> <dbl> <dbl>
#1 A 1 NA
#2 A 1 3
#3 A 2 4
#4 B NA 5
#5 B NA 6
#6 B NA 6
source to share