Regex-grab in between: right border is not escaped

I want to extract information from a series of R.Rd files. I want examples (although it could be any tex tag) and want to extract between the left text tag and the closing curly brace. I want to also grab the closing curly braces that are escaped in some way ( [}]

or \\}

, or in some way I don't think).

So, here I have a sample and my attempt to extract, but it only captures up to the first escaped curly brace:

## fake tex
x <- "Here we go \\example{ x <- 6\ngsub(\"\\}\", \"\", x, perl=TRUE)\ngsub(\"[}]\", \"\", x, perl=TRUE)\n}\n\\end{here}"

## regex to extract
regmatches(x, gregexpr("(?<=\\\\example\\{)([^}]*)(?=\\})", x, perl = TRUE))

      

Current output

[[1]]
[1] " x <- 6\ngsub(\"\\"

      

desired result

" x <- 6\ngsub(\"\\}\", \"\", x, perl=TRUE)\ngsub(\"[}]\", \"\", x, perl=TRUE)\n"

      

+3


source to share


2 answers


One way to do this is to get rid of the escaped curly braces first and then put them back at the end:

x <- gsub("\\\\}","\001",x)
x <- gsub("\\[}\\]","\002",x)
match <- regmatches(x, gregexpr("(?<=\\\\example\\{)([^}]*)(?=\\})", x, perl = TRUE))
match <- gsub("\001","\\\\}",match)
match <- gsub("\002","[}]",match)

      



This gives

> match
[1] " x <- 6\ngsub(\"\\}\", \"\", x, perl=TRUE)\ngsub(\"[}]\", \"\", x, perl=TRUE)\n"

      

+1


source


Below is the desired output, at least for the example you provided:



> gsub(".+example\\{(.+)}.+","\\1",x)
[1] " x <- 6\ngsub(\"\\}\", \"\", x, perl=TRUE)\ngsub(\"[}]\", \"\", x, perl=TRUE)\n"

      

0


source







All Articles