Matching parentheses across multiple lines (with awk?)

I want to filter footnotes from a LaTeX document using a bash script. It might look like one of the following examples:

Some text with a short footnote.\footnote{Some \textbf{explanation}.}

Some text with a longer footnote.%
  \footnote{Lorem ipsum dolor
     sit amet, etc. etc. etc. \emph{along \emph{multiple} lines}
     but all lines increased indent from the start.}

      

The leftovers should be:

Some text with a short footnote.

Some text with a longer footnote.%

      

I don't need extra spaces.

Since matching parentheses cannot be accomplished with regular expressions, I suppose I cannot use one sed

to do this. Is this possible with awk

or some other tool?

+3


source to share


2 answers


With GNU awk for multi-char RS and null FS, breaking the record into characters:



$ cat tst.awk
BEGIN { RS="[\\\\]footnote"; ORS=""; FS="" }
NR>1 {
    braceCnt=0
    for (charPos=1; charPos<=NF; charPos++) {
        if ($charPos == "{") { ++braceCnt }
        if ($charPos == "}") { --braceCnt }
        if (braceCnt == 0)   { break }
    }
    $0 = substr($0,charPos+1)
}
{ print }

$ awk -f tst.awk  file
Some text with a short footnote.

Some text with a longer footnote.%

      

+1


source


Using a recursive regex on the command line perl

, you can match matching parentheses like this:

perl -00pe 's/%?\s*\\footnote({(?:[^{}]*|(?-1))*})//g' file

Some text with a short footnote.

Some text with a longer footnote.

      



For regex details here is a regex demo

+2


source







All Articles