Capture group multiple times

I have been playing around with regex in Java lately and I have a problem that is (in theory) easy to solve, but I wandered if there is an easier way to do this (yes, yes, I am lazy), the problem is to grab the group multiple times . this is:

public static void main(String[] args) {
    Pattern p = Pattern.compile("A (IvI(.*?)IvI)*? A");
    Matcher m = p.matcher("A IvI asd IvI IvI qwe IvI A"); //ANY NUMBER of IvI x IvI
    //Matcher m = p.matcher("A  A");
    int loi = 0; //last Occurrence Index
    String storage;
    while (loi >= 0 && m.find(loi)) {
        System.out.println(m.group(1));
        if ((storage = m.group(2)) != null) {
            System.out.println(storage);
        }
        //System.out.println(m.group(1));
        loi = m.end(1);
    }
    m.find();
    System.out.println("2 opt");
    Pattern p2 = Pattern.compile("IvI(.*?)IvI");
    Matcher m2 = p2.matcher(m.group(1)); //m.group(1) = "IvI asd IvI IvI qwe IvI"
    loi = 0;
    while (loi >= 0 && m2.find(loi)) {
        if ((storage = m2.group(1)) != null) {
            System.out.println(storage);
        }
        loi = m2.end(0);
    }
}

      

Using ONLY Pattern p

, is there a way to get what is inside IvI's

?
(the test line will have "asd" and "qwe") given that there could be any number of sections IvI's

, something similar to what I am trying to do in the first one, namely finding the first occurrence of a group, then moving the index and finding the next group etc. etc...

Using the code I wrote while it returns asd IvI IvI qwe

as group 2, not just asd

, and then qwe

, in part, I suppose it might be because of the (. *?) Part, shouldn't be greedy, but still it goes up qwe

by consuming two of IvI's

, I mention this because otherwise I can use the final index of those who have the method matcher.find(anInt)

, but it doesn't work; I don't think there is something wrong with the regex as the following code works without using IvI

.

public static void main(String[] args) {
    Pattern p = Pattern.compile("(.*?)IvI");
    Matcher m = p.matcher("bla bla blaIvI");
    m.find();
    System.out.println(m.group(1));
}

      

Prints: bla bla bla

THE SOLUTION I KNOW (but I lazily remember)

(Also the first code below will show the message "2 opt") The solution divides it into subgroups and uses a different regex where you only process those subgroups one at a time ...

By the way: I did my homework This page mentions

Since the capture group with a quantifier is kept on its own number, what value does the engine return when checking the group? All motors return the last committed value. For example, if you match the string A_B_C_D_ with ([AZ]) +, when you check for a match, group 1 will be D. Except for the .NET engine, all intermediate values ​​are lost. Essentially, group 1 is overwritten every time its pattern is matched.

But I still hope that you will give me good news ...

+3


source to share


1 answer


No, unfortunately, as mentioned in your quote, the java.util.regex regex implementation does not support retrieving any previous repeating capture group values ​​after a single match. The only way to get them, as your code shows, is to find () multiple matches of the repeated part of your regex.

I've also looked at other regex implementations in Java, for example:



but I couldn't find anyone that supports it (Microsoft .NET engine only). If I understand correctly, the state machine based regex implementation cannot easily implement this feature. However, java.util.regex does not use state machines.

If anyone knows a Java regex library that supports this behavior, please share it because it will be a powerful feature.

ps it took me a while to figure out your question. The name is good, but the body confused me about whether I understood you correctly.

+5


source







All Articles