Match in specific locations

This is a follow-up to this question (doesn't ask me). While trying to answer, I ran into several problems.

Consider a string strings123[abc789<span>123</span>def<span>456</span>000]strings456

, how would I match numbers in square brackets that are not tagged span

in Python

(using a newer module regex

)?
In the example line, this would be 789

and 000

.


I played with \G

like ( demo )
(?:\G(?!\A)|\[)
[^\d\]]*
\K
\d+

      

and (*SKIP)(*FAIL)

( demo ):

<span>.*?</span>(*SKIP)(*FAIL)
|
\d+

      

But failed to merge statement :

<span>.*?</span>(*SKIP)(*FAIL)
|
(?:
    (?:\G(?!\A)|\[)
    [^\d\]]*
    (\d+)
    [^\d\]]*
    \K
)

      

How can I do that?

+3


source to share


2 answers


One of the things I love about the Pyge regex module is that it supports infinite lookbehind width:

  • Variable-length lookbehind

          Lookbehind can match a variable length string.

>>> import regex
>>> s = 'strings123[abc789<span>123</span>def<span>456</span>000]strings456'
>>> rx = r'(?<=\[[^][]*)(?:<span>[^<]*</span>(*SKIP)(?!)|\d+)(?=[^][]*])'
>>> regex.findall(rx, s)
['789', '000']
>>> 

      



Template details :

  • (?<=\[[^][]*)

    - must be [

    followed by zero or more characters other than ]

    and [

    , immediately to the left of the current location
  • (?:

    - launching a group without capturing
    • <span>[^<]*</span>(*SKIP)(?!)

      - match a <span>

      , then 0+ characters except <

      (with [^<]*

      negative character class) and then a </span>

      and cancel the match, staying at the end position of the match and keep looking for the next match
    • |

      - or
    • \d+

      - 1 + numbers
  • (?=[^][]*])

    - must be ]

    after zero or more characters other than ]

    and [

    , immediately to the right of the current location.
+3


source


I was thinking about an algorithm that looks like this.



+1


source







All Articles