Regex Behavior Matching

I'm trying to create regex ( (?:I\d-?)*I3(?:-?I\d)*

) here :

Of line A-B-C-I1-I2-D-E-F-I1-I3-D-D-D-D-I1-I1-I3-I1-I1-I3-I2-L-K-I3-P-F-I2-I2

I get the following match I1-I3

, I1-I1-I3-I1-I1-I3-I2

and I3

- this is the desired behavior. However, in R:

x <- "A-B-C-I1-I2-D-E-F-I1-I3-D-D-D-D-I1-I1-I3-I1-I1-I3-I2-L-K-I3-P-F-I2-I2"
strsplit(x, "(?:I\d-?)*I3(?:-?I\d)*")

      

this returns an error:

Error: '\d' is an unrecognized escape in character string starting ""(?:I\d"

      

I tried perl=TRUE

it but it doesn't matter.

I also tried to change the regular expression as follows: (?:I\\d-?)*I3(?:-?I\\d)*

but it does not give the correct result, and corresponds to A-B-C-I1-I2-D-E-F-

, -D-D-D-D-

, -L-K-

and -P-F-I2-I2

. `How can I reproduce the desired behavior in R?

+3


source to share


1 answer


If we want a split

string and get a substring based on the pattern shown, we can use that as a pattern to skip ( (*SKIP)(*F)

) and split the string with the rest of the characters.

 v1 <- strsplit(x, '(?:I\\d-?)*I3(?:-?I\\d)*(*SKIP)(*F)|.', perl=TRUE)[[1]]

      

Empty / empty elements can be removed using nzchar

to return a boolean TRUE / FALSE vector depending on whether the string was empty or empty.



 v1[nzchar(v1)]
 #[1] "I1-I3"                "I1-I1-I3-I1-I1-I3-I2" "I3"   

      

Or, since we are more interested in extracting the template, str_extract

it would be helpful.

 library(stringr)
 str_extract_all(x, '(?:I\\d-?)*I3(?:-?I\\d)*')[[1]]
 #[1] "I1-I3"                "I1-I1-I3-I1-I1-I3-I2" "I3"  

      

+1


source







All Articles