Finding the last occurrence of a word

I have the following line:

<SEM>electric</SEM> cu <SEM>hello</SEM> rent <SEM>is<I>love</I>, <PARTITION />mind

      

I want to find the last "SEM" tag before the "PARTITION" tag. not a SEM end tag, but a start tag. The result should be:

<SEM>is <Im>love</Im>, <PARTITION />

      

I tried this regex:

<SEM>[^<]*<PARTITION[ ]/>

      

but it only works if the "SEM" and "PARTITION" tags have no other tag in between. Any ideas?

+1


source to share


6 answers


And here is your bully Regex !!!

(?=[\s\S]*?\<PARTITION)(?![\s\S]+?\<SEM\>)\<SEM\>

      

What it means: "Somewhere ahead there is a PARTITION tag ... but so far there is no other SEM tag ahead ... match the SEM tag."



Enjoy!

Here's this regex broken down:

(?=[\s\S]*?\<PARTITION) means "While ahead somewhere is a PARTITION tag"
(?![\s\S]+?\<SEM\>) means "While ahead somewhere is not a SEM tag"
\<SEM\> means "Match a SEM tag"

      

+3


source


Use String.IndexOf to find PARTITION and String.LastIndexOf to find SEM?



int partitionIndex = text.IndexOf("<PARTITION");
int emIndex = text.LastIndexOf("<SEM>", partitionIndex);

      

+7


source


If you are going to use a regular expression to find the last occurrence of something, you can also use the right-to-left regular parsing option:

new Regex("...", RegexOptions.RightToLeft);

      

+2


source


This is the solution, I checked at http://regexlib.com/RETester.aspx

<\s*SEM\s*>(?!.*</SEM>.*).*<\s*PARTITION\s*/> 

      

How you want to use the latter, the only way to determine is to find only those characters that do not contain </SEM>

.

I've included "\ s *" in case <SEM> or <PARTITION/>

there are spaces in it.

Basically, we exclude the word </SEM>

:

(?!.*</SEM>.*)

      

+1


source


Have you tried this:

<EM>.*<PARTITION\s*/>

      

Your regex matched anything except "<" after the "EM" tag. So it will stop matching when it hits the "EM" end tag.

0


source


Bit quick and dirty, but try this:

(<SEM>.*?</SEM>.*?)*(<SEM>.*?<PARTITION)

      

and see what's in C # /. NET the equivalent of $ 2

The secret lies in the lazy matching construct (. *?) --- I assume / hope C # supports this.

Obviously Jon Skeet's solution will work better, but you can use a regex (to make it easier to split the bits you're interested in).

(Disclaimer: I am Perl / Python / Ruby myself ...)

0


source







All Articles