Replace NA values in the column with the value in the row above +1
I have the following dataframe:
game <- c('game1','game1','game2','game2','game2','game3','game4', 'game4')
shot_number <- c(1,NA,1,NA,NA,1,1,NA)
df <- data.frame(game, shot_number)
game shot_number
game1 1
game1 NA
game2 1
game2 NA
game2 NA
game3 1
game4 1
game4 NA
I want to fill NA by adding 1 to the value on the line above, so df reads like this:
game shot_number
game1 1
game1 2
game2 1
game2 2
game2 3
game3 1
game4 1
game4 2
I don't know if there is a way to do this using the library "zoo" and na.locf, or if I will need to write a for loop or some kind of function.
source to share
Using group dplyr and cumsum:
library(dplyr)
df1 %>%
group_by(game) %>%
mutate(shot_number_new = cumsum(is.na(shot_number)) + 1)
# Source: local data frame [8 x 3]
# Groups: game [4]
#
# game shot_number shot_number_new
# <fctr> <dbl> <dbl>
# 1 game1 1 1
# 2 game1 NA 2
# 3 game2 1 1
# 4 game2 NA 2
# 5 game2 NA 3
# 6 game3 1 1
# 7 game4 1 1
# 8 game4 NA 2
source to share
Here is a basic R method that works for your example.
df$shot_number <- ave(df$shot_number, df$game,
FUN=function(i) pmin(tail(cumsum(c(1, is.na(i))), -1), i, na.rm=TRUE))
Here ave
runs the function by group ( df$game
). For each game, calculate the total NA, adding 1 to start. Discard the final value with tail
, since the result will be one element in length. Then take the minimum of that actual vector by removing any NA.
This returns
df
game shot_number
1 game1 1
2 game1 2
3 game2 1
4 game2 2
5 game2 3
6 game3 1
7 game4 1
8 game4 2
<strong> data
df <-
structure(list(game = structure(c(1L, 1L, 2L, 2L, 2L, 3L, 4L,
4L), .Label = c("game1", "game2", "game3", "game4"), class = "factor"),
shot_number = c(1, NA, 1, NA, NA, 1, 1, NA)), .Names = c("game",
"shot_number"), row.names = c(NA, -8L), class = "data.frame")
source to share
The solutions below all handle the example data in the question, but assume increasingly complex general cases. (4) is the most general, but others may be preferred for simplicity reasons if the actual situation does not require complete generality. Packages are not used.
1) . In the example data, the base string within each group is 1 and the rest of the numbers are NA, so if this is a common pattern we can use ave
with seq_along
like this :.
transform(df, shot_number = ave(shot_number, game, FUN = seq_along))
2) If the base number is not necessarily 1, replace seq_along
with (1) by f
as shown:
f <- function(x) seq(x[1], length = length(x))
transform(df, shot_number = ave(shot_number, game, FUN = f))
2a) This will also work under the same assumptions as (2). It replaces each NA 1 and then uses cumsum
within the group game
:
NAtoN <- function(x, N) replace(x, is.na(x), N)
transform(df, shot_number = ave(NAtoN(shot_number, 1), game, FUN = cumsum))
3) . If the general case was that there is some mixture of numbers and NA, but the first element of each game group is not known to be NA, then we can form groups of non-NA along with the NA that follow them:
transform(df, shot_number = ave(shot_number, cumsum(!is.na(shot_number)), FUN = f))
4) If the first element of the game group could also be NA, then process subgroups defined by non-NA followed by NA or all NA if the game group starts with NA. Use 0 as base value in case of leading NA (or replace 0 in f2
some other number.)
f2 <- function(x) ave(NAtoN(x, 0), cumsum(!is.na(x)), FUN = f)
transform(df, shot_number = ave(shot_number, game, FUN = f2))
source to share