How to constrain regex to smaller capturing

Here is my text:

"A popular resource for the Christian community in the Asheville area."
"I love the acting community in the Orange County area."

      

I would like to record "Asheville"

and "Orange County"

. How to start capture from closest "the"

to "area"

?

Here's my regex:

/the (.+?) area/

      

They record:

"Christian community in the Asheville"
"acting community in the Orange County"

      

+3


source to share


3 answers


Use a (?:(?!the).)+?

moderate greedy token :

/the ((?:(?!the).)+?) area/

      

See regex demo . This is almost the same as, /the ([^t]*(?:t(?!he)[^t]*)*?) area/

but the latter is slightly more efficient as it is an expanded pattern.

(?:(?!the).)+?

matches any 1+ characters (as few as possible) that do not start a character sequence the

.

To make it safer, add word boundaries to match whole words:



/\bthe ((?:(?!\bthe\b).)+?) area\b/

      

Ruby demo:

s = 'I love the acting community in the Orange County area.'
puts s[/the ((?:(?!the).)+?) area/,1]
# => Orange County

      

NOTE. If you expect the match to span multiple lines, remember to add the modifier /m

:

/the ((?:(?!the).)+?) area/m
                           ^

      

+2


source


Use a moderate greedy solution so that the relevant text contains no other the

. This way it will always match the latterthe

/the (?:(?!the).)+? area/

      

  • (?:(?!the).)+?

    is a moderate greedy dot that matches any character except one containing text the

    . This is mentioned with a negative lookahead (?!the)

    which says it doesn't match the text the

    . This way it makes sure that the match never contains textthe

  • This can be improved by using capture groups to simply extract the text between the

    and area

    and so on. Another way would be to do the

    and area

    how lookbehind and lookahead, although it will be slightly slower than the group capture.


Regex101 Demo

Rubular Demo

Learn more about the moderate greedy solution and when to use it .

+2


source


(?<=in the)(.*)(?=area)

      

(? <=): look behind the command (? =): Look ahead, this will throw out the line you enter after the = sign. In this case "in" and "area" will be excluded from the result.

Used here

(.), which is "greedy", but you can use (.?) to match the next word entered in the forward search command.

+2


source







All Articles