Use replace_na conditionally

I want to conditionally replace missing income before July 16, 2017 with zero using tidyverse.

My details

library(tidyverse)
library(lubridate)

    df<- tribble(
                 ~Date, ~Revenue,
          "2017-07-01",      500,
          "2017-07-02",      501,
          "2017-07-03",      502,
          "2017-07-04",      503,
          "2017-07-05",      504,
          "2017-07-06",      505,
          "2017-07-07",      506,
          "2017-07-08",      507,
          "2017-07-09",      508,
          "2017-07-10",      509,
          "2017-07-11",      510,
          "2017-07-12",      NA,
          "2017-07-13",      NA,
          "2017-07-14",      NA,
          "2017-07-15",      NA,
          "2017-07-16",      NA,
          "2017-07-17",      NA,
          "2017-07-18",      NA,
          "2017-07-19",      NA,
          "2017-07-20",      NA
          )

df$Date <- ymd(df$Date)

      

Date until which I want to conditionally replace NAs

max.date <- ymd("2017-07-16")

      

The result that I desire

    # A tibble: 20 × 2
             Date Revenue
            <chr>   <dbl>
    1  2017-07-01     500
    2  2017-07-02     501
    3  2017-07-03     502
    4  2017-07-04     503
    5  2017-07-05     504
    6  2017-07-06     505
    7  2017-07-07     506
    8  2017-07-08     507
    9  2017-07-09     508
    10 2017-07-10     509
    11 2017-07-11     510
    12 2017-07-12       0
    13 2017-07-13       0
    14 2017-07-14       0
    15 2017-07-15       0
    16 2017-07-16       0
    17 2017-07-17      NA
    18 2017-07-18      NA
    19 2017-07-19      NA
    20 2017-07-20      NA

      

The only way I could work this out was to split the df into multiple parts, update for NAs

and then the rbind

whole batch.

Can someone please help me to do this efficiently using tidyverse.

+3


source to share


1 answer


We can use mutate

the "Income" column replace

NA

with 0 to use a boolean condition that checks if the element is NA and "Date" is less than or equal to "max.date"

df %>% 
  mutate(Revenue = replace(Revenue, is.na(Revenue) & Date <= max.date, 0))
# A tibble: 20 x 2
#         Date Revenue
#       <date>   <dbl>
# 1 2017-07-01     500
# 2 2017-07-02     501
# 3 2017-07-03     502
# 4 2017-07-04     503
# 5 2017-07-05     504
# 6 2017-07-06     505
# 7 2017-07-07     506
# 8 2017-07-08     507
# 9 2017-07-09     508
#10 2017-07-10     509
#11 2017-07-11     510
#12 2017-07-12       0
#13 2017-07-13       0
#14 2017-07-14       0
#15 2017-07-15       0
#16 2017-07-16       0
#17 2017-07-17      NA
#18 2017-07-18      NA
#19 2017-07-19      NA
#20 2017-07-20      NA

      


It can be achieved by data.table

specifying a boolean condition in i and assigning ( :=

) the "Income" value to 0



library(data.table)
setDT(df)[is.na(Revenue) & Date <= max.date, Revenue := 0]

      


Or using base R

df$Revenue[is.na(df$Revenue) & df$Date <= max.date] <- 0

      

+6


source







All Articles