Replace NA values in the column with the value in the row above +1

Question

Replace NA values in the column with the value in the row above +1

I have the following dataframe:

game <- c('game1','game1','game2','game2','game2','game3','game4', 'game4')
shot_number <- c(1,NA,1,NA,NA,1,1,NA)
df <- data.frame(game, shot_number)

      game     shot_number
      game1              1
      game1             NA
      game2              1
      game2             NA
      game2             NA
      game3              1
      game4              1
      game4             NA

I want to fill NA by adding 1 to the value on the line above, so df reads like this:

      game     shot_number
      game1              1
      game1              2
      game2              1
      game2              2
      game2              3
      game3              1
      game4              1
      game4              2

I don't know if there is a way to do this using the library "zoo" and na.locf, or if I will need to write a for loop or some kind of function.

+3

for-loop r zoo na

odenhem 05 May '17 at 19:59

source to share

4 answers

Here is a basic R method that works for your example.

df$shot_number <- ave(df$shot_number, df$game,
                      FUN=function(i) pmin(tail(cumsum(c(1, is.na(i))), -1), i, na.rm=TRUE))

Here ave

runs the function by group ( df$game

). For each game, calculate the total NA, adding 1 to start. Discard the final value with tail

, since the result will be one element in length. Then take the minimum of that actual vector by removing any NA.

This returns

df
   game shot_number
1 game1           1
2 game1           2
3 game2           1
4 game2           2
5 game2           3
6 game3           1
7 game4           1
8 game4           2

<strong> data

df <-
structure(list(game = structure(c(1L, 1L, 2L, 2L, 2L, 3L, 4L, 
4L), .Label = c("game1", "game2", "game3", "game4"), class = "factor"), 
    shot_number = c(1, NA, 1, NA, NA, 1, 1, NA)), .Names = c("game", 
"shot_number"), row.names = c(NA, -8L), class = "data.frame")

+1

lmo 05 May '17 at 20:11

source to share

You can use group_by()

it row_number()

without explicitly using the original column shot_number

:

df %>%
  group_by(game) %>%
  mutate(shot_number2 = row_number())

0

davechilders 05 May '17 at 20:19

source to share

The solutions below all handle the example data in the question, but assume increasingly complex general cases. (4) is the most general, but others may be preferred for simplicity reasons if the actual situation does not require complete generality. Packages are not used.

1) . In the example data, the base string within each group is 1 and the rest of the numbers are NA, so if this is a common pattern we can use ave

with seq_along

like this :.

transform(df, shot_number = ave(shot_number, game, FUN = seq_along))

2) If the base number is not necessarily 1, replace seq_along

with (1) by f

as shown:

f <- function(x) seq(x[1], length = length(x))
transform(df, shot_number = ave(shot_number, game, FUN = f))

2a) This will also work under the same assumptions as (2). It replaces each NA 1 and then uses cumsum

within the group game

:

NAtoN <- function(x, N) replace(x, is.na(x), N)
transform(df, shot_number = ave(NAtoN(shot_number, 1), game, FUN = cumsum))

3) . If the general case was that there is some mixture of numbers and NA, but the first element of each game group is not known to be NA, then we can form groups of non-NA along with the NA that follow them:

transform(df, shot_number = ave(shot_number, cumsum(!is.na(shot_number)), FUN = f))

4) If the first element of the game group could also be NA, then process subgroups defined by non-NA followed by NA or all NA if the game group starts with NA. Use 0 as base value in case of leading NA (or replace 0 in f2

some other number.)

 f2 <- function(x) ave(NAtoN(x, 0), cumsum(!is.na(x)), FUN = f)
 transform(df, shot_number = ave(shot_number, game, FUN = f2))

0

G. Grothendieck 06 May '17 at 13:04

source to share

zx8754 · Accepted Answer · 2017-05-05T20:12:50+0000

Using group dplyr and cumsum:

library(dplyr)

df1 %>% 
  group_by(game) %>% 
  mutate(shot_number_new = cumsum(is.na(shot_number)) + 1)

# Source: local data frame [8 x 3]
# Groups: game [4]
# 
#     game shot_number shot_number_new
#   <fctr>       <dbl>           <dbl>
# 1  game1           1               1
# 2  game1          NA               2
# 3  game2           1               1
# 4  game2          NA               2
# 5  game2          NA               3
# 6  game3           1               1
# 7  game4           1               1
# 8  game4          NA               2

Replace NA values ​​in the column with the value in the row above +1

More articles:

Replace NA values in the column with the value in the row above +1