Confusion in java regex
According to the java.util.regex.Pattern documentation, ^
means negation as well as the beginning of a line. How can I determine what it is ^
used for in any program?
This program is from Thinking in Java (not relevant to the above question)
import java.util.regex.*;
public class ReFlags {
public static void main(String[] args) {
Pattern p = Pattern.compile("[^java]", Pattern.CASE_INSENSITIVE | Pattern.MULTILINE);
Matcher m = p.matcher(
"java has regex\nJava has regex\n" +
"JAVA has pretty good regular expressions\n" +
"Regular expressions are in Java");
while(m.find())
System.out.print(m.group());
}
}
outputs output
hs regex
hs regex
hs pretty good regulr expressions
Regulr expressions re in
pattern ^[java]
gives the result jJJ
.
pattern (^java)
, ^(java)
and ^java
prints the result javaJavaJAVA
.
I get the point [^java]
, but what do the other four patterns mean? What could I have done to get everything except the word java (case insensitive) in the output?
source to share
[^java]
-> it matches any charcater, but not j
either v
or a
. [^..]
called a negative character class. It matches all characters except those matched by charcters present inside the negated charcater class.
^java
â matches the line java
present at the beginning of the line. You can print the match on m.group(0)
.
^(java)
-> it commits the line java
present at the beginning of the line. You can print the match on m.group(0)
and characters inside the first capturing group on m.group(1)
, where m
is a class object Matcher
. For this you can get a string java
from m.group(0)
bothm.group(1)
(^java)
-> same as above, it captures the line java
present at the beginning of the line.
source to share
In regular expression, [âŚ]
denotes a character class. Character classes have their own mini-language: a different set of special characters are used and they have different meanings.
It is best to think of ^
in regular expressions as the start of line anchor. However, in context, [^abc]
it is a negative character class, i.e. Matches any single character except a
or b
or c
.
Another example of the difference is -
. In general, it is just a literal symbol -
. However, within a character class, it defines a range. (For example, [a-z]
matches all lowercase ASCII letters.)
source to share