How to constrain regex to smaller capturing
Here is my text:
"A popular resource for the Christian community in the Asheville area."
"I love the acting community in the Orange County area."
I would like to record "Asheville"
and "Orange County"
. How to start capture from closest "the"
to "area"
?
Here's my regex:
/the (.+?) area/
They record:
"Christian community in the Asheville"
"acting community in the Orange County"
source to share
Use a (?:(?!the).)+?
moderate greedy token :
/the ((?:(?!the).)+?) area/
See regex demo . This is almost the same as, /the ([^t]*(?:t(?!he)[^t]*)*?) area/
but the latter is slightly more efficient as it is an expanded pattern.
(?:(?!the).)+?
matches any 1+ characters (as few as possible) that do not start a character sequence the
.
To make it safer, add word boundaries to match whole words:
/\bthe ((?:(?!\bthe\b).)+?) area\b/
Ruby demo:
s = 'I love the acting community in the Orange County area.'
puts s[/the ((?:(?!the).)+?) area/,1]
# => Orange County
NOTE. If you expect the match to span multiple lines, remember to add the modifier /m
:
/the ((?:(?!the).)+?) area/m
^
source to share
Use a moderate greedy solution so that the relevant text contains no other the
. This way it will always match the latterthe
/the (?:(?!the).)+? area/
-
(?:(?!the).)+?
is a moderate greedy dot that matches any character except one containing textthe
. This is mentioned with a negative lookahead(?!the)
which says it doesn't match the textthe
. This way it makes sure that the match never contains textthe
- This can be improved by using capture groups to simply extract the text between
the
andarea
and so on. Another way would be to dothe
andarea
how lookbehind and lookahead, although it will be slightly slower than the group capture.
Learn more about the moderate greedy solution and when to use it .
source to share
(?<=in the)(.*)(?=area)
(? <=): look behind the command (? =): Look ahead, this will throw out the line you enter after the = sign. In this case "in" and "area" will be excluded from the result.
Used here(.), which is "greedy", but you can use (.?) to match the next word entered in the forward search command.
source to share