Java find value in string using regex

I am wondering how to use matcher

in java.

I have a template that I have compiled and when I run the analyzer results, I don't understand why the specific value is missing.

My code:

String str = "star wars";
Pattern p = Pattern.compile("star war|Star War|Starwars|star wars|star wars|pirates of the caribbean|long strage trip|drone|snatched (2017)");
Matcher matcher = p.matcher(str);
while (matcher.find()) {
        System.out.println("\nRegex : " matcher.group());
    }

      

I am amazed by the "star war" which is correct, as in my picture.

But I don't get Star Wars hits and I don't understand why, as this is part of my template.

+3


source to share


3 answers


The behavior is expected to be caused by the NFA regex interleaving being "eager", which means that the first match wins and the rest of the alternatives are not even tested. Also note that once the regex engine finds a match in the consumption pattern (and yours is a consumption pattern, this is not a zero-width assertion like lookahead / lookbehind / word boundary / anchor), the index is promoted to the end of the match and the next match will be found from this position.

So, once your first alternate branch star war

matches, there is no way to match star wars

as the regex index is before the last one s

.

Just check if the string contains the strings you are checking, the simplest approach is with a loop:



String str = "star wars";
String[] arr = {"star war","Star War","Starwars","star wars","pirates of the caribbean","long strage trip","drone","snatched (2017)"};
for(String s: arr){
    if(str.contains(s))
        System.out.println(s);
}

      

See Java demo

By the way, your regex contains snatched (2017)

, and it doesn't match, (

and )

it only matches snatched 2017

. To match literal parentheses, (

and should be avoided )

. I also removed the error entry for star wars

.

+2


source


The best way to create your regex would be like this:

String pattern = "[Ss]tar[\\s]{0,1}[Ww]ar[s]{0,1}";

      

Violation:

  • [Ss] : it will match S or s in the first position
  • \ s : space representation
  • {0,1} : the previous character (or set) will match 0 to 1 times

An alternative is



String pattern = "[Ss]tar[\\s]?[Ww]ar[s]?";

      

  • ? : the previous character (or set) will match once or not at all

For more information see https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

Change 1: Fixed a typo ( \s

โ†’ \\s

). Thanks @eugene.

+1


source


You want to match the entire input sequence, so you must use Matcher.matches()

or add ^

and $

:

Pattern p = Pattern.compile("^(star war|Star War|Starwars|star wars|"
        + "star wars|pirates of the caribbean)$");

      

will print

Regex : star wars

      

But I agree with @NAMS: don't create your regex like this.

0


source







All Articles