Replacing specific values in a data frame as NA

Question

Replacing specific values in a data frame as NA

Suppose I have a data.frame

names  <- c("John", "Mark", "Larry", "Will", "Kate", "Daria", "Tom")
gender <- c("M", "M", "M", "M", "F", "F", "M")
mark <- c(1, 2, 3, 1, 2, 3, 1)
df <- data.frame(names, gender, mark)
df

  names gender mark
1  John      M    1
2  Mark      M    2
3 Larry      M    3
4  Will      M    1
5  Kate      F    2
6 Daria      F    3
7   Tom      M    1

I can't figure out how to replace certain values like NAs

. For example, if I want mark

for Kate

, Daria

and Tom

be NAs

:

  names gender mark
1  John      M    1
2  Mark      M    2
3 Larry      M    3
4  Will      M    1
5  Kate      F    NA
6 Daria      F    NA
7   Tom      M    NA

+3

r dataframe na

Zlo June 17. 15 at 18:17

source to share

2 answers

is.na(df$mark[df$names %in% c('Kate', 'Daria', 'Tom')]) <- TRUE

This is a syntax that I find useful at times. In this case, not so fast.

Benchmark

big.df1 <- data.frame(names = rep(names, 1e3), 
                      gender = rep(gender, 1e3), 
                      mark = rep(mark, 1e3))
big.df4 <- big.df3 <- big.df2 <- big.df1

microbenchmark(
  plafort = is.na(big.df1$mark[big.df1$names %in% c('Kate', 'Daria', 'Tom')]) <- TRUE,
  akrun1  = within(big.df2, mark <- replace(mark, names %in% c('Kate', 'Daria', 'Tom'), NA)),
  akrun2  = big.df3$mark[big.df3$names %in% c('Kate', 'Daria', 'Tom')] <- NA,
  akrun3  = is.na(big.df4$mark) <- big.df4$names %in% c('Kate', 'Daria', 'Tom')
  )
# 
# Unit: microseconds
#     expr     min       lq     mean   median       uq
#  plafort 389.623 408.9660 484.6090 426.9275 540.8135
#   akrun1 287.381 319.3570 388.3125 357.2530 419.8220
#   akrun2 193.035 204.2860 627.6559 227.7735 327.8440
#   akrun3 208.431 221.6555 274.1615 235.2740 287.3825
#        max neval
#    777.272   100
#    661.214   100
#  37325.194   100
#   1110.445   100

+1

Pierre lafortune June 17. 15 at 20:19

source to share

akrun · Accepted Answer · 2015-06-17T18:22:01+0000

Try

df <- within(df, mark <- replace(mark, names %in% c('Kate', 'Daria', 'Tom'), NA))
df
#    names gender mark
#1  John      M    1
#2  Mark      M    2
#3 Larry      M    3
#4  Will      M    1
#5  Kate      F   NA
#6 Daria      F   NA
#7   Tom      M   NA

or

 df$mark[df$names %in% c('Kate', 'Daria', 'Tom')] <- NA

or

 is.na(df$mark) <- df$names %in% c('Kate', 'Daria', 'Tom')

Replacing specific values ​​in a data frame as NA

More articles:

Replacing specific values in a data frame as NA