Match and replace across multiple quoted lines with REGEX

I want to replace all spaces in quotes with underscores in R. I am not sure how to correctly identify quoted strings when there are multiple. My initial effort is failing and I didn't even get single / double quotes.

require(stringi)
s = "The 'quick brown' fox 'jumps over' the lazy dog"
stri_replace_all(s, regex="('.*) (.*')", '$1_$2')
#> [1] "The 'quick brown' fox 'jumps_over' the lazy dog"

      

Thanks for the help.

+3


source to share


2 answers


Suppose you need to match all non-overlapping substrings that start with '

, then have 1 or more characters except '

, and then end with '

. Template '[^']+'

.

Then you can use the following basic R code:

x = "The 'quick cunning brown' fox 'jumps up and over' the lazy dog"
gr <- gregexpr("'[^']+'", x)
mat <- regmatches(x, gr)
regmatches(x, gr) <- lapply(mat, gsub, pattern="\\s", replacement="_")
x
## => [1] "The 'quick_cunning_brown' fox 'jumps_up_and_over' the lazy dog"

      

See R this demo version . Or use gsubfn

:

> library(gsubfn)
> rx <- "'[^']+'"
> s = "The 'quick cunning brown' fox 'jumps up and over' the lazy dog"
> gsubfn(rx, ~ gsub("\\s", "_", x), s)
[1] "The 'quick_cunning_brown' fox 'jumps_up_and_over' the lazy dog"
> 

      

To support escape sequences, you can use the much more complex PCRE regex:

(?<!\\)(?:\\{2})*\K'[^'\\]*(?:\\.[^'\\]*)*'

      



More details

  • (?<!\\)

    - no \

    immediately before the current location
  • (?:\\{2})*

    - zero or more sequence of 2 \

    s
  • \K

    - match the reset statement
  • '

    - single quote
  • [^'\\]*

    - zero or more characters, except '

    and\

  • (?:\\.[^'\\]*)*

    - zero or more sequence:
    • \\.

      - a \

      followed by any char but newline
    • [^'\\]*

      - zero or more characters, except '

      and\

  • '

    - single quote.

And the R demo would look like

x = "The \\' \\\\\\' \\\\\\\\'quick \\'cunning\\' brown' fox 'jumps up \\'and\\' over' the lazy dog"
cat(x, sep="\n")
gr <- gregexpr("(?<!\\\\)(?:\\\\{2})*\\K'[^'\\\\]*(?:\\\\.[^'\\\\]*)*'", x, perl=TRUE)
mat <- regmatches(x, gr)
regmatches(x, gr) <- lapply(mat, gsub, pattern="\\s", replacement="_")
cat(x, sep="\n")

      

Output:

The \' \\\' \\\\'quick \'cunning\' brown' fox 'jumps up \'and\' over' the lazy dog
The \' \\\' \\\\'quick_\'cunning\'_brown' fox 'jumps_up_\'and\'_over' the lazy dog

      

+4


source


Try the following:



require(stringi)
s = "The 'quick brown' fox 'jumps over' the lazy dog"
stri_replace_all(s, regex="('[a-z]+) ([a-z]+')", '$1_$2')

      

+1


source







All Articles