Dplyr is equivalent to DF [DF == X] <- Y

I am wondering if there is an equivalent dplyr

for

df <- data.frame(A=1:5,B=2:6,C=-1:3)
df[df==2] <- 10

      

I'm looking for

df %>% <??>

      

That is, an instruction that is related to other verbs dplyr

+3


source share


2 answers


1) replace . Try it. This only requires magrittr, although dplyr will import the corresponding portion of magrittr, so it will work with dplyr too:

df %>% replace(. == 2, 10)

      

giving:

   A  B  C
1  1 10 -1
2 10  3  0
3  3  4  1
4  4  5 10
5  5  6  3

      

1a) Rewriting Note that the above is non-destructive, so if you want to update df

you will need to revert it:

df <- df %>% replace(. == 2, 10)

      



or

df %>% replace(. == 2, 10) -> df

      

or use the magrittr operator %<>%

, which removes the link twice df

:

df %<>% replace(. == 2, 10)

      

2) arithmetic This will also work:

df %>% { 10 * (. == 2) + . * (. != 2) }

      

+7


source


The OP's question is how to replace values ​​with with dplyr

and this was solved thanks to G. Grothendieck. But I am curious how the differences between the various approaches are based on dplyr

, data.table

and on the R base. So I developed and ran the following benchmarking.

# Load package
library(dplyr)
library(data.table)
library(microbenchmark)

# Create example data frame
df <- data.frame(A = 1:5, B = 2:6, C = -1:3)
# Convert to data.table
dt <- as.data.table(df)

# Method 1: Use mutate_all and ifelse
F1 = function(df){df %>% mutate_all(funs(ifelse(. == 2, 10, .)))}
# Method 2: Use mutate_all and replace
F2 = function(df){df %>% mutate_all(funs(replace(., . == 2, 10)))}
# Method 3: Use replace
F3 = function(df){df %>% replace(. == 2, 10)}
# Method 4: Base R data frame assignment
F4 = function(df){
  df[df == 2] <- 10
  return(df)
}

# Benchmarking
microbenchmark(
  M1 = F1(df),
  M2 = F2(df),
  M3 = F3(df),
  M4 = F4(df),
  # Same as M4, but use data.table object as input
  M5 = F4(dt)
)

Unit: microseconds
 expr      min         lq       mean     median         uq       max neval
   M1 8634.974 13028.7975 17224.4669 14907.3735 19496.5275 79750.182   100
   M2 8925.565 12626.2675 16698.7412 15551.7410 18658.1125 35468.760   100
   M3  282.252   391.6240   591.2534   553.5980   647.8965  3290.797   100
   M4  163.578   252.1025   423.7627   349.6080   420.8125  5415.382   100
   M5  228.367   333.2495   596.1735   440.3775   555.5230  7506.609   100 

      



The results show that mutata_all

with ifelse

( M1

) or replace

( M2

) is much slower than other approaches. Used replace

with pipe ( M3

) is fast, but still slightly slower than base R ( M4

). Convert data.frame

to data.table

and then apply assignment ( M5

) substitute no faster than M4

.

So, I think there is no special need to use functions in this case dplyr

, because it is not faster than the basic R ( M4

) method . Also no need to convert data.frame

to data.table

if pipe operation is required. We can use a pipe with replace

( M3

). Or we can define a function like F4

, and put it in a pipe operation.

+2


source







All Articles