Matching parentheses across multiple lines (with awk?)
I want to filter footnotes from a LaTeX document using a bash script. It might look like one of the following examples:
Some text with a short footnote.\footnote{Some \textbf{explanation}.} Some text with a longer footnote.% \footnote{Lorem ipsum dolor sit amet, etc. etc. etc. \emph{along \emph{multiple} lines} but all lines increased indent from the start.}
The leftovers should be:
Some text with a short footnote. Some text with a longer footnote.%
I don't need extra spaces.
Since matching parentheses cannot be accomplished with regular expressions, I suppose I cannot use one sed
to do this. Is this possible with awk
or some other tool?
+3
source to share
2 answers
With GNU awk for multi-char RS and null FS, breaking the record into characters:
$ cat tst.awk
BEGIN { RS="[\\\\]footnote"; ORS=""; FS="" }
NR>1 {
braceCnt=0
for (charPos=1; charPos<=NF; charPos++) {
if ($charPos == "{") { ++braceCnt }
if ($charPos == "}") { --braceCnt }
if (braceCnt == 0) { break }
}
$0 = substr($0,charPos+1)
}
{ print }
$ awk -f tst.awk file
Some text with a short footnote.
Some text with a longer footnote.%
+1
source to share
Using a recursive regex on the command line perl
, you can match matching parentheses like this:
perl -00pe 's/%?\s*\\footnote({(?:[^{}]*|(?-1))*})//g' file
Some text with a short footnote.
Some text with a longer footnote.
For regex details here is a regex demo
+2
source to share