Regex-grab in between: right border is not escaped
I want to extract information from a series of R.Rd files. I want examples (although it could be any tex tag) and want to extract between the left text tag and the closing curly brace. I want to also grab the closing curly braces that are escaped in some way ( [}]
or \\}
, or in some way I don't think).
So, here I have a sample and my attempt to extract, but it only captures up to the first escaped curly brace:
## fake tex
x <- "Here we go \\example{ x <- 6\ngsub(\"\\}\", \"\", x, perl=TRUE)\ngsub(\"[}]\", \"\", x, perl=TRUE)\n}\n\\end{here}"
## regex to extract
regmatches(x, gregexpr("(?<=\\\\example\\{)([^}]*)(?=\\})", x, perl = TRUE))
Current output
[[1]]
[1] " x <- 6\ngsub(\"\\"
desired result
" x <- 6\ngsub(\"\\}\", \"\", x, perl=TRUE)\ngsub(\"[}]\", \"\", x, perl=TRUE)\n"
+3
source to share
2 answers
One way to do this is to get rid of the escaped curly braces first and then put them back at the end:
x <- gsub("\\\\}","\001",x)
x <- gsub("\\[}\\]","\002",x)
match <- regmatches(x, gregexpr("(?<=\\\\example\\{)([^}]*)(?=\\})", x, perl = TRUE))
match <- gsub("\001","\\\\}",match)
match <- gsub("\002","[}]",match)
This gives
> match
[1] " x <- 6\ngsub(\"\\}\", \"\", x, perl=TRUE)\ngsub(\"[}]\", \"\", x, perl=TRUE)\n"
+1
source to share