Java find value in string using regex
I am wondering how to use matcher
in java.
I have a template that I have compiled and when I run the analyzer results, I don't understand why the specific value is missing.
My code:
String str = "star wars";
Pattern p = Pattern.compile("star war|Star War|Starwars|star wars|star wars|pirates of the caribbean|long strage trip|drone|snatched (2017)");
Matcher matcher = p.matcher(str);
while (matcher.find()) {
System.out.println("\nRegex : " matcher.group());
}
I am amazed by the "star war" which is correct, as in my picture.
But I don't get Star Wars hits and I don't understand why, as this is part of my template.
source to share
The behavior is expected to be caused by the NFA regex interleaving being "eager", which means that the first match wins and the rest of the alternatives are not even tested. Also note that once the regex engine finds a match in the consumption pattern (and yours is a consumption pattern, this is not a zero-width assertion like lookahead / lookbehind / word boundary / anchor), the index is promoted to the end of the match and the next match will be found from this position.
So, once your first alternate branch star war
matches, there is no way to match star wars
as the regex index is before the last one s
.
Just check if the string contains the strings you are checking, the simplest approach is with a loop:
String str = "star wars";
String[] arr = {"star war","Star War","Starwars","star wars","pirates of the caribbean","long strage trip","drone","snatched (2017)"};
for(String s: arr){
if(str.contains(s))
System.out.println(s);
}
See Java demo
By the way, your regex contains snatched (2017)
, and it doesn't match, (
and )
it only matches snatched 2017
. To match literal parentheses, (
and should be avoided )
. I also removed the error entry for star wars
.
source to share
The best way to create your regex would be like this:
String pattern = "[Ss]tar[\\s]{0,1}[Ww]ar[s]{0,1}";
Violation:
- [Ss] : it will match S or s in the first position
- \ s : space representation
- {0,1} : the previous character (or set) will match 0 to 1 times
An alternative is
String pattern = "[Ss]tar[\\s]?[Ww]ar[s]?";
- ? : the previous character (or set) will match once or not at all
For more information see https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
Change 1: Fixed a typo ( \s
โ \\s
). Thanks @eugene.
source to share
You want to match the entire input sequence, so you must use Matcher.matches()
or add ^
and $
:
Pattern p = Pattern.compile("^(star war|Star War|Starwars|star wars|"
+ "star wars|pirates of the caribbean)$");
will print
Regex : star wars
But I agree with @NAMS: don't create your regex like this.
source to share