Regexp. How to match word fails and precedes other characters

Question

Regexp. How to match word fails and precedes other characters

I want to replace units mm

with units cm

in my code. In the case of a large number of such replacements, I use regexp

.

I made an expression like this:

(?!a-zA-Z)mm(?!a-zA-Z)

But it still matches words like summa

, gamma

and dummy

.

How to create a regular expression correctly?

+3

regex

Roma Karageorgievich Jul 26 17 at 13:39

source to share

2 answers

There is no need to use lookaheads and lookbehinds, so if you want to simplify your template you can try something like this:

\d+\s?(mm)\b

This assumes that your millimeter symbol will always follow the number, with extra spacing in between, which I think is a reasonable guess in this case.

\b

checks a word boundary to make sure it is mm

not part of a word like dummy

etc.

Demo here

+2

Tom wyllie Jul 26 17 at 13:51

source to share

Wiktor Stribiżew · Accepted Answer · 2017-07-26T13:40:45+0000

Use character classes and change the first (?!...)

lookahead to lookbehind:

(?<![a-zA-Z])mm(?![a-zA-Z])
^^^^^^^^^^^^^   ^^^^^^^^^^^

See regex demo

Sample matches:

(?<![a-zA-Z])

- negative lookbehind that does not match if there is an ASCII letter immediately to the left of the current location
mm

- literal substring
(?![a-zA-Z])

- negative lookahead that does not match if there is an ASCII letter immediately to the right of the current location

NOTE . If you need to make your template Unicode-aware, replace [a-zA-Z]

with [^\W\d_]

(and use a flag re.U

if you are using Python 2.x).

Regexp. How to match word fails and precedes other characters

More articles: