Quote string for TeX input

I am writing a Python script that uses plain text for input and produces LaTeX code as output. At some point the script must cite all the characters that have special meaning in of TeX, such as %

, &

, \

, etc. on the.

This is more difficult than I expected. I currently have this:

def ltx_quote(s):
    s = re.sub(r'[\\]', r'\\textbackslash{}', s)
    # s = re.sub(r'[{]', r'\\{{}', s)
    # s = re.sub(r'[}]', r'\\}{}', s)
    s = re.sub(r'[&]', r'\\&{}', s)
    s = re.sub(r'[$]', r'\\${}', s)
    s = re.sub(r'[%]', r'\\%{}', s)
    s = re.sub(r'[_]', r'\\_{}', s)
    s = re.sub(r'[\^]', r'\\^{}', s)
    s = re.sub(r'[~]', r'\\~{}', s)
    s = re.sub(r'[|]', r'\\textbar{}', s)
    s = re.sub(r'[#]', r'\\#{}', s)
    s = re.sub(r'[<]', r'\\textless{}', s)
    s = re.sub(r'[>]', r'\\textgreater{}', s)
    return s

      

The problem lies in the symbols {

and }

, because they are potentially created by an earlier substitution ( \

β†’ \textbackslash{}

), in which case they should not be substituted. I think the solution would be to do all the replacements in one step, but I don't know how.

+3


source to share


1 answer


Maybe try using the undocumented re.Scanner :

import re
scanner = re.Scanner([
    (r"[\\]", r'\\textbackslash{}'),
    (r"[{]", r'\\{{}'),
    (r"[}]", r'\\}{}'), 
    (r".", lambda s, t: t)
])

tokens, remainder = scanner.scan("\\foo\\{bar}")
print(''.join(tokens))

      

gives

\\textbackslash{}foo\\textbackslash{}\\{{}bar\\}{}

      



Unlike the code you posted if you look at the source code , re.Scanner.scan only makes one pass through the line. Once a match is done, the next match starts from where the last match ended.

The first argument re.Scanner

is a lexicon

list of 2 sets. Each 2-tuple represents a regex pattern and action. The action can be a string called by a function (function) or None

(no action).

All templates are compiled into one compound template. Thus, the order in which the patterns are listed in the vocabulary is important. The first template to match victory.

If a match occurs, the action is called if it is called, or simply returned if a string. The return values ​​are collected in a list tokens

.

+3


source







All Articles