What is a regex to match one element in a group of nearly equivalent elements?

In the following content:

<page1 ...>
   ...
</page>

<page2 ...>
   ...
</page>

<page3 ...>
   ...
   <queue>...</queue>
   ...
</page>

      

How do you only find a match for the very last element (the one that contains the queue tag)?

I tried

(?s)<page.*?<queue>.*?</page>

      

But this matches the content of ENTIRE. I tried to play with the looks, but I can't figure it out.

+3


source to share


5 answers


You can use the following monster for your specific use case:

<page(?:[^/]+/(?!page))+queue>(?:[^/]+|/(?!page))+/page>

      

.. not sure if this is the best example for learning regex and definitely not a good idea for real life XML use. But it is possible. Do not forget to get out /

on \/

the languages that are quoted in the construction of regular expressions /.../

.



See technical explanation at http://regex101.com/r/qZ0yR1/2 .

The logic is as follows:

  • <page.../queue>.../page>

    - get the content of the page element containing the end tag for the queue

  • [^/]+/(?!page)

    - match all text with the next closing tag, but make sure it is not the closing tag for the page

  • (?:[^/]+/(?!page))+queue>

    - repeat above to match as many times as needed until the end tag is queued

  • (?:[^/]+|/(?!page))+/page>

    - repeat as many times as necessary until the closing tag appears on the page (I used it |

    as a shortcut for (?:[^/]+/(?!page))+[^/]+/page>

    , because the expression at point 2.will match the text if the next closing tag is not for the page, but we must match this text exactly at the end)

+2


source


you can use this template

(?:<page[^>]*>(?:(?!<queue>).)*?<\/page>)|(<page[^>]*>.*?<\/page>)  

      



Demo

the idea is to consume tags that do not contain queue

, and then consume and capture those that do.

+2


source


This is the most concise one I could put together:

<page(.(?!page))*<queue.*<\/page>

      

You need the DOTALL flag and the whole match will be your goal.

Watch the demo

+1


source


You can use greedy match (. *) To match every last tag.

Here's an example (sorry Java):

final String str = "<page1 foo='bar'>apple</page> <page2 foo='bar'>orange</page> <page3 foo='bar'>pear</page>";
final Pattern p = Pattern.compile(".*<page[^>]+>(\\w+)</page>$");
final Matcher matcher = p.matcher(str);
matcher.find();

// Prints pear
System.out.println(matcher.group(1));

      

Also, +1 for 'why pick regex'; regex is not suitable for this problem.

0


source


Assuming the tag cannot be "queue" and could be something else, try this:

(?<=[>]).*(?=\<\/[\w]+\>([\n]?)(.*[\n])?\<\/page\>$)

example here:

http://regex101.com/r/sN6aC5/1

This uses looking at finding the last closed tag </...>

followed by anything, and then the closed tag </page>

which is the end of the line. Then it, using lookbehind, matches everything between that final closing tag and the first one >

before that (which should be the last opening tag)

0


source







All Articles