Does multiple regex match "or" safe?

We have a config file that lists a series of regexps used to exclude files for the tool we are building (it scans .class files). The developer added all the individual regular expressions to one using the OR "|" operator as follows:

RX1 | rx2 | RX3 | RX4

My gut reaction is that there will be an expression that will screw it up and give us the wrong answer. He claims not; they are ORed together. I can't think of a case to break this, but it's still awkward to worry about implementation.

Is it safe to do this?

+2


source to share


6 answers


Not only safe, but will likely give better performance than a separate regex matching.



Take the individual regex patterns and test them. If they work as expected then they are merged and each will still be consistent. This way you have increased your reach with a single regex, rather than multiple regex patterns that need to be matched individually.

+3


source


As long as they are valid regular expressions, they should be safe. Escape brackets, parentheses, curly braces, etc. Will be a problem. You can try to parse each part before adding it to the main regex to make sure they are complete.



In addition, some engines have escape sequences that can include regex flags in an expression (for example, case sensitivity). I don't have enough experience to tell if they carry over to the second OR or not. As a state machine, I would have thought it was not.

+2


source


It's safe, just like everything else in regular expressions!

+1


source


Regarding regex, google code search provides regex to search, so ... it is possible to have safe regex

0


source


I don't see any problem either.

I assume by "Safe" you mean that it will suit your needs (because I've never heard of the RegEx security hole). Safe or not, we cannot tell from this. You should give us more details like the complete regex. Do you wrap it up in a group and allow for multiple? Do you wrap it up with a start and end anchor?

If you want to match multiple class file names, make sure you use a start and end anchor to make sure the match is from the beginning and end. Like this " ^(file1|file2)\.class$

". Without a start and end anchor, you might end up combining ' my_file1.class

too'

0


source


The answer is yes, it is safe, and the reason it is safe is because '|' has the lowest precedence in regular expressions.

I.e:

regexpa|regexpb|regexpc

      

equivalent to

(regexpa)|(regexpb)|(regexpc)

      

with the obvious exception that the second will end up with positional matches, whereas the first won't, however, the two will match exactly the same input. Or, to put it another way, using the Java language:

String.matches("regexpa|regexpb|regexpc");

      

equivalent to

String.matches("regexpa") | String.matches("regexpb") | String.matches("regexpc");

      

0


source







All Articles