Replace string in R with patterns and replacements of both vectors

Let's say I have two vectors:

a <- c("this", "is", "test")
b <- c("that", "was", "boy")

      

I also have a string variable:

string <- "this is a story about a test"

      

I want to replace the values ​​in a string so that it becomes the following:

string <- "that was a story about a boy"

      

I could do it using a for loop, but I want this to be vectorized. How should I do it?

+3


source to share


5 answers


If you're open to using a non-base package, this stringi

will work just fine:

stringi::stri_replace_all_fixed(string, a, b, vectorize_all = FALSE)
#[1] "that was a story about a boy"

      

Note that this also works for input lines of length> 1.



To be on the safe side, you can adapt this - similar to RUser's answer - to check word boundaries before replacing:

stri_replace_all_regex(string, paste0("\\b", a, "\\b"), b, vectorize_all = FALSE)

      

This ensures that you don't accidentally change his

to hwas

.

+8


source


Here are some solutions. Each of them will work even if it string

is a character vector of strings, in which case the substitutions will be performed on each of its components.

1) Reduce This does not use packages.

Reduce(function(x, i) gsub(paste0("\\b", a[i], "\\b"), b[i], x), seq_along(a), string)
## [1] "that was a story about a boy"

      

2) gsubfn gsubfn

is similar gsub

, but the replace argument can be a substitution list (or some other object).

library(gsubfn)

gsubfn("\\w+", setNames(as.list(b), a), string)
## [1] "that was a story about a boy"

      

3) loop This is not a vectorization, but an addition for comparison. Packages are not used.

out <- string
for(i in seq_along(a)) out <- gsub(paste0("\\b", a[i], "\\b"), b[i], out)
out
## [1] "that was a story about a boy"

      



Note: There is a question if loops are possible. For example, if

a <- c("a", "A")
b <- rev(a)

      

we want

  • "a" to replace "A" and then back to "a" or
  • "a" and "A" to be exchanged.

All of the above solutions take the first case. If we want the second case, do the operation twice. We'll illustrate with (2) because it's the shortest, but the same idea applies to everyone:

# swap "a" and "A"
a <- c("a", "A")
b <- rev(a)

tmp <- gsubfn("\\w+", setNames(as.list(seq_along(a)), a), string)
gsubfn("\\w+", setNames(as.list(b), seq_along(a)), tmp)
## [1] "this is A story about A test"

      

+4


source


> library(stringi)
> stri_replace_all_regex(string, "\\b" %s+% a %s+% "\\b", b, vectorize_all=FALSE)
#[1] "that was a story about a boy"

      

+3


source


Cropping with a little function that only depends on R base

:

repWords <- function(string,toRep,Rep,sep='\\s'){

  wrds <- unlist(strsplit(string,sep))
  ix <- match(toRep,wrds)
  wrds[ix] <- Rep  
  return(paste0(wrds,collapse = ' '))

}

a <- c("this", "is", "test")
b <- c("that", "was", "boy")

string <- "this is a story about a test"

> repWords(string,a,b)
[1] "that was a story about a boy"

      

Note:

This assumes that you have the appropriate number of replacements. You can define the delimiter with sep

.

+2


source


Speaking of external packages, here's another one:

a <- c("this", "is", "test")
b <- c("that", "was", "boy")
x <- "this is a story about a test"


library(qdap)
mgsub(a,b,x)

      

which gives:

 "that was a story about a boy"

      

+2


source







All Articles