What is a regex to match one element in a group of nearly equivalent elements?
In the following content:
<page1 ...>
...
</page>
<page2 ...>
...
</page>
<page3 ...>
...
<queue>...</queue>
...
</page>
How do you only find a match for the very last element (the one that contains the queue tag)?
I tried
(?s)<page.*?<queue>.*?</page>
But this matches the content of ENTIRE. I tried to play with the looks, but I can't figure it out.
source to share
You can use the following monster for your specific use case:
<page(?:[^/]+/(?!page))+queue>(?:[^/]+|/(?!page))+/page>
.. not sure if this is the best example for learning regex and definitely not a good idea for real life XML use. But it is possible. Do not forget to get out /
on \/
the languages that are quoted in the construction of regular expressions /.../
.
See technical explanation at http://regex101.com/r/qZ0yR1/2 .
The logic is as follows:
-
<page.../queue>.../page>
- get the content of the page element containing the end tag for the queue -
[^/]+/(?!page)
- match all text with the next closing tag, but make sure it is not the closing tag for the page -
(?:[^/]+/(?!page))+queue>
- repeat above to match as many times as needed until the end tag is queued -
(?:[^/]+|/(?!page))+/page>
- repeat as many times as necessary until the closing tag appears on the page (I used it|
as a shortcut for(?:[^/]+/(?!page))+[^/]+/page>
, because the expression at point 2.will match the text if the next closing tag is not for the page, but we must match this text exactly at the end)
source to share
You can use greedy match (. *) To match every last tag.
Here's an example (sorry Java):
final String str = "<page1 foo='bar'>apple</page> <page2 foo='bar'>orange</page> <page3 foo='bar'>pear</page>";
final Pattern p = Pattern.compile(".*<page[^>]+>(\\w+)</page>$");
final Matcher matcher = p.matcher(str);
matcher.find();
// Prints pear
System.out.println(matcher.group(1));
Also, +1 for 'why pick regex'; regex is not suitable for this problem.
source to share
Assuming the tag cannot be "queue" and could be something else, try this:
(?<=[>]).*(?=\<\/[\w]+\>([\n]?)(.*[\n])?\<\/page\>$)
example here:
http://regex101.com/r/sN6aC5/1
This uses looking at finding the last closed tag </...>
followed by anything, and then the closed tag </page>
which is the end of the line. Then it, using lookbehind, matches everything between that final closing tag and the first one >
before that (which should be the last opening tag)
source to share