Different behavior of the same regex in Python and Java
First, my apologies as I don't know regexes that are good.
I am using regex to match a string. I tested it in the Python command line interface, but when I ran it in Java, it gave a different result.
Python execution:
re.search("[0-9]*[\\.[0-9]+]?[^0-9]*D\\([M|W]\\)\\s*US", "9.5 D(M) US");
gives the result as:
<_sre.SRE_Match object; span=(0, 11), match='9.5 D(M) US'>
But Java code
import java.util.Set;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class RegexTest {
private static final Pattern FALLBACK_MEN_SIZE_PATTERN = Pattern.compile("[0-9]*[\\.[0-9]+]?[^0-9]*D\\([M|W]\\)\\s*US");
public static void main(String[] args) {
String strTest = "9.5 D(M) US";
Matcher matcher = FALLBACK_MEN_SIZE_PATTERN.matcher(strTest);
if (matcher.find()) {
System.out.println(matcher.group(0));
}
}
}
gives the result as:
5 D (M) US
I donβt understand why he behaves differently.
source to share
Here's a template that will work the same in Java and Python:
"[0-9]*(?:\\.[0-9]+)?[^0-9]*D\\([MW]\\)\\s*US"
In Python [\\.[0-9]+]?
reads as 2 subpatterns: [\.[0-9]+
(1 or more .
s, [
s, or numbers) and ]?
(0 or 1 ]
). See how your regular expression works in Python here . Or, in more detail with capture groups, here .
In Java, it is read as a single character class (i.e., [
and is ignored]
internally ), since they cannot be handled correctly by the regex engine, so the entire subpattern standing for 0 or 1 is digit or ), and since it is optional, it doesn't grab anything (you can get a visual hint of Visual Regex Tester , enter as input and as regex)..
+
123.+[]
[\.[0-9]+]?
And the last touch: [M|W]
means M
, |
or W
, whereas I think you meant [MW]
= M
or W
.
source to share
I'm not a Python expert, so I can't tell why it worked in Python, but in Java, your problem is part of it [\\.[0-9]+]?
. You probably meant (\\.[0-9]+)?
.
Be that as it may, this is a list of characters within []
, followed by ?
. That is, this part of the expression matches only one or a null character, so it cannot match .5
.
Here's an illustration of the matching attempts:
Now, if your template used ()
instead []
, this would be the result:
source to share