Java regex to match words with optional plural comments between every two letters (as a back reference to a regex subexpression)
I need a java regex that matches a word, taking into account the possibility of a comment inside every two subsequent letters. Eg "W/*comment1*/OR/*comment2*/D"
. I tried using the named group and backlink:
(?<comment>\s*/\*.*\*/\s*)W\k<comment>*O\k<comment>*R\k<comment>*D
But that doesn't work because the backreference refers to the match of the named group, not the actual group subexpression. So, I had to repeat the comment sub-expression (?<comment>\s*/\*.*\*/\s*)
in all places where it was expected:
W(\s*/\*.*\*/\s*)*O(\s*/\*.*\*/\s*)*R(\s*/\*.*\*/\s*)*D
This works, but is there an even more elegant solution without having to repeat the "comment" subpattern many times?
You can do this by capturing an email (or several) at a time, discarding the optional following comments, for example:
String toBeParsed="W/* this is comment 1 */OR/*this is comment 2*/D";
String regexp = "(\\w+)(/\\*.*?\\*/)*"; // match letters + optional comment
Pattern pattern =Pattern.compile(regexp);
Matcher matcher=pattern.matcher(toBeParsed);
String word="";
while(matcher.find()){
String letter=matcher.group(1);
String comment=matcher.group(2);
System.out.println("found letter(s) "+letter);
word+=letter;
if (comment!=null) System.out.println("discarding comment "+matcher.group(2));
}
System.out.println(word);
output
found letter(s) W
discarding comment /* this is comment 1 */
found letter(s) OR
discarding comment /*this is comment 2*/
found letter(s) D
WORD
"how to return a reference to a regular expression subexpression"
Do you mean it?
"(.*)\\1"
This matches any duplicate word. \ 1 refers to the first group, which is the first parenthesis.