Decode regex to see if only numbers are allowed

I am getting the regex as a string. Is it possible to find out if only regex allows a number? The regex I get is basically of the form:

  • ^ [0-9] + $
  • ^ [0-9] {5,10} $
  • ^ [0-9] {6} $
  • ^ [0-9] {1.13} \ w $

But I can get another regex.

+3


source to share


2 answers


Like most people, it is a very difficult task to achieve a simple simple expression, as there are many ways in which one can write the same thing, including cases where numbers are hidden inside character classes, or character classes are neglected, etc. However, I gave it a shot and tested it a bit, it works for basic scenarios.

The below regex will match any regex that only matches numbers and not any other characters. It can contain one or more digits, limit only certain digits, etc., which doesn't really matter. The re-expression to capture ensures that the matching regex doesn't match any numbers .

  • The regular expression matches the representation of the different ways of digits, including \d

    , [0-9]

    , \p{N}

    , [123]

    and even literal 4

    , but dumped character classes [^\WA-Za-z_]

    or[.-:]

  • Regex matches regex with or without anchor tags
  • The regular expression support for all quantifiers, including *

    , +

    , ?

    and even {x,y}

    . Also works with nonliving and possessive quantifiers ie \d*?

    and\d*+

  • The regex works with positive or negative lookbehinds and lookaheads both.
  • The regex includes support |

    like\d?|[34]?|123

Limitations:

  • The regex does not support capturing or non-capturing groups as that makes it quite complex. So any regex containing a capturing group (..)

    or a (:..)

    non-capturing group will fail, even if they could only be a digit
  • Regular expression does not support negative character classes. For example: [^\WA-Za-z_]

    only matches numbers, but that won't work.
  • Although this is not really a limitation, I would like to point out that the regular expression check is NOT performed.

Regex:

^\^?((\(\?\<[=!][^\(\)]*?\))?(\[\d*(?:\d-\d)?\d*\]|\\d|\\p\{N\}|\d+(?:\|\d+)*)(\*|\+|\?|\{\d*,?\d*\})?(\?|\+)?(\(\?[=!][^\(\)]*?\))?)+(?:\|(?:(?:\(\?\<[=!][^\(\)]*?\))?(\[\d*(\d-\d)?\d*\]|\\d|\\p\{N\}|\d+(\|\d+)*)(\*|\+|\?|\{\d*,?\d*\})?(\?|\+)?(\(\?[=!][^\(\)]*?\))?))*\$?$

      

Regex101 Demo

An easier way to visualize the solution:



^(lookbehind)?(digit_classes)+(quantifier)?(quantifier_type)?(lookahead)?

lookbehind = (?<=.. or (?<!..
digit_classes = \d or [0-9] or \p{N} etc.
quantifier = * or + or ? or {,}
quantifier_type = ? or +
lookahead = (?=.. or (?!..

// Repeat the above to support 'OR' i.e |

      

((\(\?\<[=!][^\(\)]*?\))?(\[\d*(?:\d-\d)?\d*\]|\\d|\\p\{N\}|\d+(?:\|\d+)*)(\*|\+|\?|\{\d*,?\d*\})?(\?|\+)?(\(\?[=!][^\(\)]*?\))?)+

or the first capture group includes support for all digit types detailed below.

  • The first capture group (\(\?\<[=!][^\(\)]*?\))?

    includes matching positive or negative appearance
    • \(\?\<

      includes the start of the search behind ie (?<

      , followed by [=!]

      as it can be positive or negative
    • [^\(\)]*?

      does not eagerly allow any character other than (

      or )

      in the lookbehind to be present
  • The next capturing group (\[\d*(?:\d-\d)?\d*\]|\\d|\\p\{N\}|\d+(?:\|\d+)*)

    involves matching various digital representations such as \d

    either [0-9]

    or\p{N}

    • [\d*(?:\d-\d)?\d*\]

      matches [0-9]

      or [1234]

      or even[1-3567]

    • \\d

      directly corresponds \d

    • \\p\{N\}

      directly corresponds \p{N}

    • \d+(?:\|\d+)*

      allows you to represent literals eg. '4' and support multiple literals like4|6|8

  • Next capture group (\*|\+|\?|\{\d*,?\d*\})?

    includes a comparison of all the quantifiers ie The *

    , +

    , ?

    , {,}

    .
    • \*|\+|\?

      represents all basic quantifiers
    • \{\d*,?\d*\}

      supports quantifiers specifying minimum and maximum values โ€‹โ€‹such as \d{5,}

      or [0-9]{3,6}

      , etc.
  • The following capture group (\?|\+)?

    allows you to support quantifier type markings like lazy ie \d*?

    or possessive ie\d*+

  • The next capture group (\(\?[=!][^\(\)]*?\))?

    allows positive or negative imagery

Thereafter, the first capture group is repeated once more to support using |

between multiple digital representations, i.e. the above groups are presented (..)*

, so include support for |

, it is duplicated like this (..)+(\|(..))*

to create the final regex.

Works for:

^[0-9]{6}$
^[0-9]+$
^[0-9]{5,10}$
\d[0][3-9]*?\d[0-7]*?$
\d*|[0-9]+|123
\d+(?!\s)
(?<=\w)[0-9]

      

Doesn't work (but should work):

(\d)*          # Capturing groups don't work
(?:\d+)        # Non-capturing groups don't work
^[^\WA-Za-z_]  # Negated character classes don't work

      

Note. ... All groups capture groups to make them easier to visualize. All of them can be converted to non-capture at any time.

+1


source


^(\d|(?<!\^)\d-\d|\\d|\^|\$|\[|\]|{\d+(,\d+)?}|\+|\*|\\b|\\B|\\\d|\(\?[:=!<][^]+\)|\?|\||\((\d|(?<!\^)\d-\d|\\d|\^|\$|\[|\]|{\d+(,\d+)?}|\+|\*|\\b|\\B|\\\d|\(\?[:=!<][^]+\)|\?|\|)+\))+$

      

I know, I know

This only matches what can be in a regex that matches numbers. This includes (?=My phone number is: )[\d-]+

which corresponds 123-4567-890

to My phone number is: 123-4567-890

.



To check if RegEx only matches numbers, try matching it with this. If it matches anything, then it's okay.

It doesn't catch invalid ones, for example. \d^\d$\d

If you notice any errors in it, then please let me know and I will fix it.

0


source







All Articles