Regex to Match if the string is not sandwiched between two tokens

I am having problems with a regex I am working on. Basically, I want to match a string if there is no specific string between the start and end. Let me clarify with the excluded line (123) the start (hello) and end (abc):

hello123abc   ==> no match
helloa123abc  ==> no match
hello123aabc  ==> no match
helloa123aabc ==> no match
hello1abc     ==> match
hello23abc    ==> match
helloaabc     ==> match
helloabc      ==> match

      

I have a skeleton skeleton:

=~ m/hello___abc

      

and tried filling in the gap like this:

(?!123).*?
.*?(?!123)
.*?(?!123).*?
(?!123)
(?!123)*?
.*?[^1][^2][^3].*?

      

and a few other combinations that I don't remember, but none of them worked. Does anyone have a way to do this?

+3


source to share


4 answers


Here you can use PCRE verbs (*SKIP)(*F)

,

(?:hello.*?123.*?abc)(*SKIP)(*F)|hello.*?abc

      

DEMO



OR

(?:hello(?:(?!hello).)*123.*?abc)(*SKIP)(*F)|hello.*?abc

      

DEMO

+1


source


I think you are making it too difficult.

Instead of focusing on what you want to map (which is unclear), just focus on what you are not doing and then invert the logic.

Assuming line by line processing would work:

use strict;
use warnings;

while (<DATA>) {
    if (! /hello.*123.*abc/) {
        print "matches  - $_";
    } else {
        print "no match - $_";
    }
}

__DATA__
hello123abc
helloa123abc
hello123aabc
helloa123aabc
hello1abc
hello23abc
helloaabc
helloabc

      

Outputs:

no match - hello123abc
no match - helloa123abc
no match - hello123aabc
no match - helloa123aabc
matches  - hello1abc
matches  - hello23abc
matches  - helloaabc
matches  - helloabc

      



Extrapolated answer to capture instead of matching

If you want to not just match but also write strings delimited by hello and abc but not containing 123, the following will work for you:

use strict;
use warnings;

my $data = do {local $/; <DATA>};

while ($data =~ m/(hello(?:(?!123).)*?abc)/g) {
    print "matches - $1\n";
}

__DATA__
hello123abc hello1abc helloa123abchello123aabc
hello23abc helloaabc helloa123aabc helloabc

      

Outputs:

matches - hello1abc
matches - hello23abc
matches - helloaabc
matches - helloabc

      

+1


source


One way is to describe only valid characters between the two strings ("hello" and "abc"). to do this, you need to exclude the first character of the string you want to ban, and the first character of the trailing substring to describe the allowed characters between substrings:

^hello(?>[^1a]+|1(?!23)|a(?!bc$))*abc$

      

To do the same on a larger line (containing multiple "hello" ... "abc" parts), you need to remove the anchors:

hello(?>[^1a]+|1(?!23)|a(?!bc))*abc

      

0


source


   (?!^hello.*?123)(^.*$)

      

This will work.

See demo ..

http://regex101.com/r/uU0hL0/1

0


source







All Articles