Match in specific locations

Question

Match in specific locations

This is a follow-up to this question (doesn't ask me). While trying to answer, I ran into several problems.

Consider a string strings123[abc789123def456000]strings456

, how would I match numbers in square brackets that are not tagged span

in Python

(using a newer module regex

)?
In the example line, this would be 789

and 000

.

I played with \G

like ( demo )

(?:\G(?!\A)|\[)
[^\d\]]*
\K
\d+

and (*SKIP)(*FAIL)

( demo ):

<span>.*?</span>(*SKIP)(*FAIL)
|
\d+

But failed to merge statement :

<span>.*?</span>(*SKIP)(*FAIL)
|
(?:
    (?:\G(?!\A)|\[)
    [^\d\]]*
    (\d+)
    [^\d\]]*
    \K
)

How can I do that?

+3

python regex

Jan 03 May '17 at 11:37

source to share

2 answers

I was thinking about an algorithm that looks like this.

Find the square brackets and the content inside it and store the result in a variable. Regex will be \[[^]]*\]

.
Now find the tags 

and replace it with -

just for the simplicity of the next step. Regex will be (.*?)

.
Now you are left with the content of the square brackets, other than tags 

. Just search \d+

to match the numbers.

+1

Rahul 03 May '17 at 11:50

source to share

Wiktor Stribiżew · Accepted Answer · 2017-05-03T12:03:20+0000

One of the things I love about the Pyge regex module is that it supports infinite lookbehind width:

Variable-length lookbehind
Lookbehind can match a variable length string.

>>> import regex
>>> s = 'strings123[abc789<span>123</span>def<span>456</span>000]strings456'
>>> rx = r'(?<=\[[^][]*)(?:<span>[^<]*</span>(*SKIP)(?!)|\d+)(?=[^][]*])'
>>> regex.findall(rx, s)
['789', '000']
>>>

Template details :

(?<=\[[^][]*)

- must be [

followed by zero or more characters other than ]

and [

, immediately to the left of the current location
(?:

- launching a group without capturing
- [^<]*(*SKIP)(?!)
 
 - match a 
 
 , then 0+ characters except <
 
 (with [^<]*
 
 negative character class) and then a 
 
 and cancel the match, staying at the end position of the match and keep looking for the next match
- |
 
 - or
- \d+
 
 - 1 + numbers
(?=[^][]*])

- must be ]

after zero or more characters other than ]

and [

, immediately to the right of the current location.

Match in specific locations

More articles: