Replace string in R with patterns and replacements of both vectors
Let's say I have two vectors:
a <- c("this", "is", "test")
b <- c("that", "was", "boy")
I also have a string variable:
string <- "this is a story about a test"
I want to replace the values ββin a string so that it becomes the following:
string <- "that was a story about a boy"
I could do it using a for loop, but I want this to be vectorized. How should I do it?
source to share
If you're open to using a non-base package, this stringi
will work just fine:
stringi::stri_replace_all_fixed(string, a, b, vectorize_all = FALSE)
#[1] "that was a story about a boy"
Note that this also works for input lines of length> 1.
To be on the safe side, you can adapt this - similar to RUser's answer - to check word boundaries before replacing:
stri_replace_all_regex(string, paste0("\\b", a, "\\b"), b, vectorize_all = FALSE)
This ensures that you don't accidentally change his
to hwas
.
source to share
Here are some solutions. Each of them will work even if it string
is a character vector of strings, in which case the substitutions will be performed on each of its components.
1) Reduce This does not use packages.
Reduce(function(x, i) gsub(paste0("\\b", a[i], "\\b"), b[i], x), seq_along(a), string)
## [1] "that was a story about a boy"
2) gsubfn gsubfn
is similar gsub
, but the replace argument can be a substitution list (or some other object).
library(gsubfn)
gsubfn("\\w+", setNames(as.list(b), a), string)
## [1] "that was a story about a boy"
3) loop This is not a vectorization, but an addition for comparison. Packages are not used.
out <- string
for(i in seq_along(a)) out <- gsub(paste0("\\b", a[i], "\\b"), b[i], out)
out
## [1] "that was a story about a boy"
Note: There is a question if loops are possible. For example, if
a <- c("a", "A")
b <- rev(a)
we want
- "a" to replace "A" and then back to "a" or
- "a" and "A" to be exchanged.
All of the above solutions take the first case. If we want the second case, do the operation twice. We'll illustrate with (2) because it's the shortest, but the same idea applies to everyone:
# swap "a" and "A"
a <- c("a", "A")
b <- rev(a)
tmp <- gsubfn("\\w+", setNames(as.list(seq_along(a)), a), string)
gsubfn("\\w+", setNames(as.list(b), seq_along(a)), tmp)
## [1] "this is A story about A test"
source to share
Cropping with a little function that only depends on R base
:
repWords <- function(string,toRep,Rep,sep='\\s'){
wrds <- unlist(strsplit(string,sep))
ix <- match(toRep,wrds)
wrds[ix] <- Rep
return(paste0(wrds,collapse = ' '))
}
a <- c("this", "is", "test")
b <- c("that", "was", "boy")
string <- "this is a story about a test"
> repWords(string,a,b)
[1] "that was a story about a boy"
Note:
This assumes that you have the appropriate number of replacements. You can define the delimiter with sep
.
source to share