Na.locf with group_by from dplyr

I am trying to use na.locf

from a batch of zoo

grouped data using dplyr

. I am using the first solution for this question: Using dplyr window functions to create final values ​​(fill in the NA values)

library(dplyr);library(zoo)
df1 <- data.frame(id=rep(c("A","B"),each=3),problem=c(1,NA,2,NA,NA,NA),ok=c(NA,3,4,5,6,NA))
df1
  id problem ok
1  A       1 NA
2  A      NA  3
3  A       2  4
4  B      NA  5
5  B      NA  6
6  B      NA NA

      

The problem arises when all data within a group is NA. As you can see from the problem column, the data na.locf

for id = B comes from a different group: the latest data for id = A.

df1 %>% group_by(id) %>% na.locf()

Source: local data frame [6 x 3]
Groups: id [2]

     id problem    ok
  <chr>   <chr> <chr>
1     A       1  <NA>
2     A       1     3
3     A       2     4
4     B       2     5 #problem col is wrong
5     B       2     6 #problem col is wrong
6     B       2     6 #problem col is wrong

      

This is my expected result. The data for id = B is independent of what is in id = A

     id problem    ok
  <chr>   <chr> <chr>
1     A       1  <NA>
2     A       1     3
3     A       2     4
4     B       NA     5
5     B       NA     6
6     B       NA     6

      

+3


source to share


1 answer


We need to use it na.locf

internally mutate_all

as it na.locf

can be applied directly to the dataset. Even though it is grouped by "id", applying na.locf

by applying to the full dataset does not match any grouping behavior



df1 %>%
     group_by(id) %>%
     mutate_all(funs(na.locf(., na.rm = FALSE)))
#    id problem    ok
#  <fctr>   <dbl> <dbl>
#1      A       1    NA
#2      A       1     3
#3      A       2     4
#4      B      NA     5
#5      B      NA     6
#6      B      NA     6

      

+6


source







All Articles