Regex Behavior Matching
I'm trying to create regex ( (?:I\d-?)*I3(?:-?I\d)*
) here :
Of line A-B-C-I1-I2-D-E-F-I1-I3-D-D-D-D-I1-I1-I3-I1-I1-I3-I2-L-K-I3-P-F-I2-I2
I get the following match I1-I3
, I1-I1-I3-I1-I1-I3-I2
and I3
- this is the desired behavior. However, in R:
x <- "A-B-C-I1-I2-D-E-F-I1-I3-D-D-D-D-I1-I1-I3-I1-I1-I3-I2-L-K-I3-P-F-I2-I2"
strsplit(x, "(?:I\d-?)*I3(?:-?I\d)*")
this returns an error:
Error: '\d' is an unrecognized escape in character string starting ""(?:I\d"
I tried perl=TRUE
it but it doesn't matter.
I also tried to change the regular expression as follows: (?:I\\d-?)*I3(?:-?I\\d)*
but it does not give the correct result, and corresponds to A-B-C-I1-I2-D-E-F-
, -D-D-D-D-
, -L-K-
and -P-F-I2-I2
. `How can I reproduce the desired behavior in R?
source to share
If we want a split
string and get a substring based on the pattern shown, we can use that as a pattern to skip ( (*SKIP)(*F)
) and split the string with the rest of the characters.
v1 <- strsplit(x, '(?:I\\d-?)*I3(?:-?I\\d)*(*SKIP)(*F)|.', perl=TRUE)[[1]]
Empty / empty elements can be removed using nzchar
to return a boolean TRUE / FALSE vector depending on whether the string was empty or empty.
v1[nzchar(v1)]
#[1] "I1-I3" "I1-I1-I3-I1-I1-I3-I2" "I3"
Or, since we are more interested in extracting the template, str_extract
it would be helpful.
library(stringr)
str_extract_all(x, '(?:I\\d-?)*I3(?:-?I\\d)*')[[1]]
#[1] "I1-I3" "I1-I1-I3-I1-I1-I3-I2" "I3"
source to share