Make sure the regex subpattern doesn't contain the previous subpattern?

I am wondering if there is a way to check if the subpattern matches a given sequence so that I can block it.

For example, let's say that I wanted to capture everything except a repeat of an earlier capture. So if I had a suggestion [word plus word]

, the next would have to log everything ( word plus

) up to the second occurrence word

.

(\w+)[^\1]+

      

The first is (\w+)

exciting word

. The second capture group [^...]

tries to exclude it (it was previously marked \1

), but it only works on characters, not subpanel captures.

Is there anyway to do this?

+3


source to share


2 answers


You can use templates like this:

(\w+)(?:(?!\1).)*

      



Which uses a negative lookahead to assert (on each character) that the previously matched word is not contained in the subexpression.

+9


source


You can use lazy quantifiers and search, for example:

(\w+).*?(?=\1)

      

you can also surround w + with word boundaries like this:



\b(\w+)\b.*?(?=\1)

      

so you don't match things like this: hello where would you match "ll"

+1


source







All Articles