Java regex remains processing but the result is not thrown

so i am learning regex in java and wondered why when i execute this code

String xxx = "(\\s+)?(c:/|c:\\\\|C:\\\\|C:/|c:\\|C:\\))?(\\w+(/|\\\\)?)+(/|\\\\)\\w+.[a-z]+";

String x = "C:\\Users\\esteban\\Desktop\\Java_file_testing\\file3.txt";

    if(x.matches(xxx)) {
        System.out.println("matches");
    }else {

            System.out.println("no match found ");
    }

      

this prints matches

, but when i delete the .txt

processing goes unanswered, am i doing something wrong?

+3


source to share


2 answers


The regular expression uses a dot character .

that matches[A-Za-z0-9_]

You need to avoid the dot like:

(\\s+)?(c:/|c:\\\\|C:\\\\|C:/|c:\\|C:\\))?(\\w+(/|\\\\)?)+(/|\\\\)\\w+\\.[a-z]+
                                                          here --------^

      

Btw, you can shorten your regex like this:



\s*[Cc]:(?:(?:\/|\\{1,2})\w+)+\.\w+

      

Working demo

Remember to escape backslashes:

\\s*[Cc]:(?:(?:\\/|\\\\{1,2})\\w+)+\\.\\w+

      

+1


source


You've stumbled upon a disastrous rollback !

When you write (\\w+(/|\\\\)?)+

, you are basically typing the pattern (\\w+)+

into your regex. This allows the regex engine to match the same string in multiple ways (using internal or external +

) - the number of possible paths increases exponentially, and since the engine has to try all possible matches before declaring a failure, it needs to return a value forever.

Also, a few general comments on your regex:

  • c:\\|

    will literally match the string c:|

  • /|\\\\

    - it's simple [/\\\\]

  • (\s+)?

    \s*

  • .

    is a wildcard ("nothing but newline") that must be escaped
  • for options c

    / c

    either use [cC]

    or do your whole regex case insensitive
  • when you don't need to actually write the values, using non-capturing groups (?:...)

    frees the engine of some work


With that in mind, a regex in the spirit of your first try might be:

\\s*(?:[cC]:[/\\\\])?(?:\\w+[/\\\\])*\\w+\\.[a-z]+

      

The (?:\\w+[/\\\\])

character class is [/\\\\]

no longer optional, thus avoiding the boilerplate (\\w+)+

: here's demo here .

For more information on catastrophic backtracking, I would recommend Friedl's article (and fun!) On the subject in the perl journal .

+3


source







All Articles