Replace NA values ​​in the column with the value in the row above +1

I have the following dataframe:

game <- c('game1','game1','game2','game2','game2','game3','game4', 'game4')
shot_number <- c(1,NA,1,NA,NA,1,1,NA)
df <- data.frame(game, shot_number)

      game     shot_number
      game1              1
      game1             NA
      game2              1
      game2             NA
      game2             NA
      game3              1
      game4              1
      game4             NA

      

I want to fill NA by adding 1 to the value on the line above, so df reads like this:

      game     shot_number
      game1              1
      game1              2
      game2              1
      game2              2
      game2              3
      game3              1
      game4              1
      game4              2

      

I don't know if there is a way to do this using the library "zoo" and na.locf, or if I will need to write a for loop or some kind of function.

+3


source to share


4 answers


Using group dplyr and cumsum:



library(dplyr)

df1 %>% 
  group_by(game) %>% 
  mutate(shot_number_new = cumsum(is.na(shot_number)) + 1)

# Source: local data frame [8 x 3]
# Groups: game [4]
# 
#     game shot_number shot_number_new
#   <fctr>       <dbl>           <dbl>
# 1  game1           1               1
# 2  game1          NA               2
# 3  game2           1               1
# 4  game2          NA               2
# 5  game2          NA               3
# 6  game3           1               1
# 7  game4           1               1
# 8  game4          NA               2

      

+3


source


Here is a basic R method that works for your example.

df$shot_number <- ave(df$shot_number, df$game,
                      FUN=function(i) pmin(tail(cumsum(c(1, is.na(i))), -1), i, na.rm=TRUE))

      

Here ave

runs the function by group ( df$game

). For each game, calculate the total NA, adding 1 to start. Discard the final value with tail

, since the result will be one element in length. Then take the minimum of that actual vector by removing any NA.

This returns



df
   game shot_number
1 game1           1
2 game1           2
3 game2           1
4 game2           2
5 game2           3
6 game3           1
7 game4           1
8 game4           2

      

<strong> data

df <-
structure(list(game = structure(c(1L, 1L, 2L, 2L, 2L, 3L, 4L, 
4L), .Label = c("game1", "game2", "game3", "game4"), class = "factor"), 
    shot_number = c(1, NA, 1, NA, NA, 1, 1, NA)), .Names = c("game", 
"shot_number"), row.names = c(NA, -8L), class = "data.frame")

      

+1


source


You can use group_by()

it row_number()

without explicitly using the original column shot_number

:

df %>%
  group_by(game) %>%
  mutate(shot_number2 = row_number())

      

0


source


The solutions below all handle the example data in the question, but assume increasingly complex general cases. (4) is the most general, but others may be preferred for simplicity reasons if the actual situation does not require complete generality. Packages are not used.

1) . In the example data, the base string within each group is 1 and the rest of the numbers are NA, so if this is a common pattern we can use ave

with seq_along

like this :.

transform(df, shot_number = ave(shot_number, game, FUN = seq_along))

      

2) If the base number is not necessarily 1, replace seq_along

with (1) by f

as shown:

f <- function(x) seq(x[1], length = length(x))
transform(df, shot_number = ave(shot_number, game, FUN = f))

      

2a) This will also work under the same assumptions as (2). It replaces each NA 1 and then uses cumsum

within the group game

:

NAtoN <- function(x, N) replace(x, is.na(x), N)
transform(df, shot_number = ave(NAtoN(shot_number, 1), game, FUN = cumsum))

      

3) . If the general case was that there is some mixture of numbers and NA, but the first element of each game group is not known to be NA, then we can form groups of non-NA along with the NA that follow them:

transform(df, shot_number = ave(shot_number, cumsum(!is.na(shot_number)), FUN = f))

      

4) If the first element of the game group could also be NA, then process subgroups defined by non-NA followed by NA or all NA if the game group starts with NA. Use 0 as base value in case of leading NA (or replace 0 in f2

some other number.)

 f2 <- function(x) ave(NAtoN(x, 0), cumsum(!is.na(x)), FUN = f)
 transform(df, shot_number = ave(shot_number, game, FUN = f2))

      

0


source







All Articles