Regex lookbehind - excluding words from search

I need to search my corpus for words like play or shame, but I would like to specify a search to exclude three lines: play / shame or play A / The shame and a / an / A / A WORD or a / an / A / WORD is a shame where WORD is a modifier like great game or big shame.

If anyone can help me that would be great, thanks!

In my corpus, the optional word between undefined article a / an and play, or a / an and shame is most often large and real. So even excluding these two, a lot will help me already.

The trace below works fine to exclude a / A

(?<!a\s|A\s)\bshame\b

      

To rule out changing the WORD, I tried to use? \ w in the lookbehind grep, but it just didn't work - grep below without? and it still excludes examples like shame, but it still returns unwanted examples like big shame or crying shame - see matching lines (3) and (4) in the example text below:

    (?<!a\s|A\s|a\b\w\b|A\b\w\b)\bshame\b

      

The tool I use to implement regex is AntConc, which supports Perl regex.

Sample text with two irrelevant examples (3 and 4) after using the search string below

(?<!a\s|A\s)\bshame\b

      


1 (coincidence of shame)

people move from sides. & nbsp; If you need a closer look, you need to call to enter and wait to be received. & nbsp; I guess Saul and I just aren't a shame (or just know that our bank accounts are in hard currencies) because we've wandered around in abundance. & nbsp; There are many, many small boutiques and luxuriously decorated clothing stores with a musical mouth. & abbutterflie.txt 47 1

2 (ashamed ashamed)

the last twenty years and I have experienced all kinds of great competition, but I seriously thought that anti black Nazism in football was a thing of the past. You should all hang your heads in shame, a bunch of [badword] s. adamdphillips.txt 57 1

3 (no coincidence with shame)

me is monetary as I wasn't that close to her, but she was really nice with the other girl and it messed them up a bit for them, which is a big shame. Anyway, Holly and I have since found somewhere to move in with just the two of us. It will be worth absolute luck and I will have the beans basics at aderyn.txt 60 1

4 (no coincidence with shame)

there are loads of amazingly good bands going back and forth across the country that will never get signed because no one can figure out how to sell them, and that's a crying shame. There are artists here like <a href = "http://www.angelsintheabattoir.com/" rel = "nofollow"> Thea Gilmore </a> and <a href = "http://blog.amandapalmer.net/ "rel =" nofollow "> Amanda Palmer & aderyn.txt 60 2

5 (coincidence of shame)

/> "There is no better time to show these terrorists that we are not afraid of them. Instead, we are forced to hide from shame through the cowardly actions of our bosses." <Herb Wiseman, a counselor for Lee County High School, Florida, pointed to the July 7 bombing in London. "What happens if children get on aggy91.txt 64 1

+3


source to share


1 answer


Since variable length negative lookbehinds are not allowed, the approach in your previous answer to the question will not propagate to this one.

I went with a template (*SKIP)(*FAIL)

. This will match and discard disqualified matches and only keeps qualifying matches:

/[Aa]n?( \w+)? shame(*SKIP)(*FAIL)|shame/

3844 steps ( Demo )



Or, if you want to include word boundary metacharacters:

/\b[Aa]n?( \w+)? shame\b(*SKIP)(*FAIL)|\bshame\b/

4762 steps ( Demo )

+3


source







All Articles