Regexp. How to match word fails and precedes other characters

I want to replace units mm

with units cm

in my code. In the case of a large number of such replacements, I use regexp

.

I made an expression like this:

(?!a-zA-Z)mm(?!a-zA-Z)

      

But it still matches words like summa

, gamma

and dummy

.

How to create a regular expression correctly?

+3


source to share


2 answers


Use character classes and change the first (?!...)

lookahead to lookbehind:

(?<![a-zA-Z])mm(?![a-zA-Z])
^^^^^^^^^^^^^   ^^^^^^^^^^^ 

      

See regex demo



Sample matches:

  • (?<![a-zA-Z])

    - negative lookbehind that does not match if there is an ASCII letter immediately to the left of the current location
  • mm

    - literal substring
  • (?![a-zA-Z])

    - negative lookahead that does not match if there is an ASCII letter immediately to the right of the current location

NOTE . If you need to make your template Unicode-aware, replace [a-zA-Z]

with [^\W\d_]

(and use a flag re.U

if you are using Python 2.x).

+4


source


There is no need to use lookaheads and lookbehinds, so if you want to simplify your template you can try something like this:

\d+\s?(mm)\b

      

This assumes that your millimeter symbol will always follow the number, with extra spacing in between, which I think is a reasonable guess in this case.



\b

checks a word boundary to make sure it is mm

not part of a word like dummy

etc.

Demo here

+2


source







All Articles