Dplyr is equivalent to DF [DF == X] <- Y
1) replace . Try it. This only requires magrittr, although dplyr will import the corresponding portion of magrittr, so it will work with dplyr too:
df %>% replace(. == 2, 10)
giving:
A B C 1 1 10 -1 2 10 3 0 3 3 4 1 4 4 5 10 5 5 6 3
1a) Rewriting Note that the above is non-destructive, so if you want to update df
you will need to revert it:
df <- df %>% replace(. == 2, 10)
or
df %>% replace(. == 2, 10) -> df
or use the magrittr operator %<>%
, which removes the link twice df
:
df %<>% replace(. == 2, 10)
2) arithmetic This will also work:
df %>% { 10 * (. == 2) + . * (. != 2) }
source share
The OP's question is how to replace values ββwith with dplyr
and this was solved thanks to G. Grothendieck. But I am curious how the differences between the various approaches are based on dplyr
, data.table
and on the R base. So I developed and ran the following benchmarking.
# Load package library(dplyr) library(data.table) library(microbenchmark) # Create example data frame df <- data.frame(A = 1:5, B = 2:6, C = -1:3) # Convert to data.table dt <- as.data.table(df) # Method 1: Use mutate_all and ifelse F1 = function(df){df %>% mutate_all(funs(ifelse(. == 2, 10, .)))} # Method 2: Use mutate_all and replace F2 = function(df){df %>% mutate_all(funs(replace(., . == 2, 10)))} # Method 3: Use replace F3 = function(df){df %>% replace(. == 2, 10)} # Method 4: Base R data frame assignment F4 = function(df){ df[df == 2] <- 10 return(df) } # Benchmarking microbenchmark( M1 = F1(df), M2 = F2(df), M3 = F3(df), M4 = F4(df), # Same as M4, but use data.table object as input M5 = F4(dt) ) Unit: microseconds expr min lq mean median uq max neval M1 8634.974 13028.7975 17224.4669 14907.3735 19496.5275 79750.182 100 M2 8925.565 12626.2675 16698.7412 15551.7410 18658.1125 35468.760 100 M3 282.252 391.6240 591.2534 553.5980 647.8965 3290.797 100 M4 163.578 252.1025 423.7627 349.6080 420.8125 5415.382 100 M5 228.367 333.2495 596.1735 440.3775 555.5230 7506.609 100
The results show that mutata_all
with ifelse
( M1
) or replace
( M2
) is much slower than other approaches. Used replace
with pipe ( M3
) is fast, but still slightly slower than base R ( M4
). Convert data.frame
to data.table
and then apply assignment ( M5
) substitute no faster than M4
.
So, I think there is no special need to use functions in this case dplyr
, because it is not faster than the basic R ( M4
) method . Also no need to convert data.frame
to data.table
if pipe operation is required. We can use a pipe with replace
( M3
). Or we can define a function like F4
, and put it in a pipe operation.
source share