Confusion in java regex

According to the java.util.regex.Pattern documentation, ^

means negation as well as the beginning of a line. How can I determine what it is ^

used for in any program?

This program is from Thinking in Java (not relevant to the above question)

import java.util.regex.*;
public class ReFlags {
    public static void main(String[] args) {
        Pattern p = Pattern.compile("[^java]", Pattern.CASE_INSENSITIVE | Pattern.MULTILINE);
        Matcher m = p.matcher(
        "java has regex\nJava has regex\n" +
        "JAVA has pretty good regular expressions\n" +
        "Regular expressions are in Java");
        while(m.find())
            System.out.print(m.group());
    }
}

      

outputs output

 hs regex
 hs regex
 hs pretty good regulr expressions
Regulr expressions re in

      

pattern ^[java]

gives the result jJJ

.

pattern (^java)

, ^(java)

and ^java

prints the result javaJavaJAVA

.

I get the point [^java]

, but what do the other four patterns mean? What could I have done to get everything except the word java (case insensitive) in the output?

+3


source to share


2 answers


[^java]

-> it matches any charcater, but not j

either v

or a

. [^..]

called a negative character class. It matches all characters except those matched by charcters present inside the negated charcater class.

^java

→ matches the line java

present at the beginning of the line. You can print the match on m.group(0)

.



^(java)

-> it commits the line java

present at the beginning of the line. You can print the match on m.group(0)

and characters inside the first capturing group on m.group(1)

, where m

is a class object Matcher

. For this you can get a string java

from m.group(0)

bothm.group(1)

(^java)

-> same as above, it captures the line java

present at the beginning of the line.

+3


source


In regular expression, […]

denotes a character class. Character classes have their own mini-language: a different set of special characters are used and they have different meanings.

It is best to think of ^

in regular expressions as the start of line anchor. However, in context, [^abc]

it is a negative character class, i.e. Matches any single character except a

or b

or c

.



Another example of the difference is -

. In general, it is just a literal symbol -

. However, within a character class, it defines a range. (For example, [a-z]

matches all lowercase ASCII letters.)

+1


source







All Articles