Regex matches one or two digits

If this

[0-5])

      

corresponds to ° 4

and this one

((°[0-5][0-9]))

      

corresponds to 44 °

Why is this

((°[0-5])|(°[0-5][0-9]))

      

matches ° 4 but not ° 44?

+3


source to share


3 answers


Because when you use logical OR in a regex, the regex engine will return the first match when it matches the first part of the regex (here °[0-5]

), and in that case, since it °[0-5]

matches °4

in, °44

it returns °4

and doesn't continue to match another case (here °[0-5][0-9]

):

((°[0-5])|(°[0-5][0-9]))

      



A | B, where A and B can be arbitrary REs, creates a regular expression that matches either A or B. Any number of REs can be separated by '|' in this case. This can be used within groups (see below). When checking target string, REs are delimited by '|' tasted from left to right. When one pattern matches exactly, that branch is accepted. This means that after a match, AB will no longer be checked, even if it would result in a longer match. In other words, '|' the operator is never greedy. To match literal '|', use \ | or enclose it inside a character class, as in [|].

+3


source


You are using a shorter match when interleaving regexes. Better to use this regex to match both strings:

°[0-5][0-9]?

      



Demo version of RegEx

+1


source


Because the alternation operator |

tries to use the alternatives in the order shown and selects the first successful match. Other alternatives will never be checked unless something in the regexp results in a return. For example this regex

(a|ab|abc)

      

when feeding this input:

abcdefghi

      

will only match a

. However, if the regex is changed to

(a|ab|abc)d

      

It will fit a

. Then, since the next character doesn't matter d

, it returns and then tries the next option that matches ab

. And since the next character is still not d

, it goes back again and matches abc

... and since the next character d

, the match is done.

Why don't you decrease the regex from

((°[0-5])|(°[0-5][0-9]))

      

to that?

°[0-5][0-9]?

      

It's simpler and clearer.

+1


source







All Articles