Match and replace across multiple quoted lines with REGEX
I want to replace all spaces in quotes with underscores in R. I am not sure how to correctly identify quoted strings when there are multiple. My initial effort is failing and I didn't even get single / double quotes.
require(stringi)
s = "The 'quick brown' fox 'jumps over' the lazy dog"
stri_replace_all(s, regex="('.*) (.*')", '$1_$2')
#> [1] "The 'quick brown' fox 'jumps_over' the lazy dog"
Thanks for the help.
source to share
Suppose you need to match all non-overlapping substrings that start with '
, then have 1 or more characters except '
, and then end with '
. Template '[^']+'
.
Then you can use the following basic R code:
x = "The 'quick cunning brown' fox 'jumps up and over' the lazy dog"
gr <- gregexpr("'[^']+'", x)
mat <- regmatches(x, gr)
regmatches(x, gr) <- lapply(mat, gsub, pattern="\\s", replacement="_")
x
## => [1] "The 'quick_cunning_brown' fox 'jumps_up_and_over' the lazy dog"
See R this demo version . Or use gsubfn
:
> library(gsubfn)
> rx <- "'[^']+'"
> s = "The 'quick cunning brown' fox 'jumps up and over' the lazy dog"
> gsubfn(rx, ~ gsub("\\s", "_", x), s)
[1] "The 'quick_cunning_brown' fox 'jumps_up_and_over' the lazy dog"
>
To support escape sequences, you can use the much more complex PCRE regex:
(?<!\\)(?:\\{2})*\K'[^'\\]*(?:\\.[^'\\]*)*'
More details
-
(?<!\\)
- no\
immediately before the current location -
(?:\\{2})*
- zero or more sequence of 2\
s -
\K
- match the reset statement -
'
- single quote -
[^'\\]*
- zero or more characters, except'
and\
-
(?:\\.[^'\\]*)*
- zero or more sequence:-
\\.
- a\
followed by any char but newline -
[^'\\]*
- zero or more characters, except'
and\
-
-
'
- single quote.
And the R demo would look like
x = "The \\' \\\\\\' \\\\\\\\'quick \\'cunning\\' brown' fox 'jumps up \\'and\\' over' the lazy dog"
cat(x, sep="\n")
gr <- gregexpr("(?<!\\\\)(?:\\\\{2})*\\K'[^'\\\\]*(?:\\\\.[^'\\\\]*)*'", x, perl=TRUE)
mat <- regmatches(x, gr)
regmatches(x, gr) <- lapply(mat, gsub, pattern="\\s", replacement="_")
cat(x, sep="\n")
Output:
The \' \\\' \\\\'quick \'cunning\' brown' fox 'jumps up \'and\' over' the lazy dog
The \' \\\' \\\\'quick_\'cunning\'_brown' fox 'jumps_up_\'and\'_over' the lazy dog
source to share