Need help Understanding how to use a less complex regular expression in Python

I am trying to find out more about regular expressions. I have one below which I believe finds cases where a close wig is missing by up to 999 billion. The one below I thought should do the same, but I am not getting similar results

   missingParenReg=re.compile(r"^\([$]*[0-9]{1,3}[,]?[0-9]{0,3}[,]?[0-9]{0,3}[,]?[0-9]{0,3}[.]*[0-9]*[^)]$")
   missingParenReg2=re.compile(r"^\([$]?([0-9]{1,3}[,]?)+[.]*[0-9]*[^)]$")

      

I think the second one says: There must be an open wig to start There may or may not be like one dollar sign
The next group must exist at least once, but can exist an unlimited number of times
The group must have at least one digit, but can have up to three people In a group there may be only 0 and whole 1 comma After this group there may or may not be a decimal point.If
there is a decimal point, it will be followed by whole 0's, but as many as uncountable occurrences of digits
.

I'm trying to figure out this magic stuff, so I'd appreciate a fix for my regex (if it can be fixed) in addition to a more elegant solution if you have one.

+1


source to share


4 answers


The more complex part of regexes prevents them from accepting valid input, which makes them reject invalid input. For example, the second expression accepts input that is clearly wrong, including:

  • (1,2,3,4

    - one digit between each comma
  • (12,34,56

    - two digits between each comma
  • (1234......5

    - unlimited number of decimal points
  • (1234,.5

    - comma before decimal point
  • (123,456789,012

    - if there are any commas, they must be between each triple
  • (01234

    - leading zero is not normal
  • (123.4X

    - the last char is not a closing pair

Here's an alternative regex that should reject the above examples:

[-+]?[$]?(0|[1-9]\d*|[1-9]\d{0,2}(,\d{3})*)(\.\d+)?



  • Additional plus / minus indicator.
  • Additional dollar sign.
  • The three options are separated |

    :
    • A single zero digit (for numbers like 0.5 or just 0).
    • Any number of digits without commas. The first digit must not be zero.
    • Comma-separated numbers. The first digit must not be zero. Up to three digits before the first comma. Each comma must be followed by exactly three digits.
  • An optional single decimal point followed by one or more digits.

As for the partners, if all you care about is whether the partners are balanced, then you can ignore the number parsing; just trust that any combination of numbers, decimal points and commas between paranas is valid. Then use a construct (?!...)

that evaluates to a match if the input doesn't match the regex inside.

(?!\([$\d.,]+\))

+3


source


Are there nested parentheses (your regexes assume they aren't there)? If not:

whether_paren_is_missing = (astring[0] == '(' and not astring[-1] == ')')

      

To verify a portion of the dollar amount:

import re

cents = r"(?:\.\d\d)" # cents 
re_dollar_amount = re.compile(r"""(?x)
    ^               # match at the very begining of the string
    \$?             # optional dollar sign
    (?:               # followed by
        (?:             # integer part  
        0               # zero
        |               # or
        [1-9]\d{,2}     # 1 to 3 digits (no leading zero) 
        (?:               # followed by
            (?:,\d{3})*     # zero or more three-digits groups with commas                          
            |               # or
            \d*             # zero or more digits without commas (no leading zero)
            )
        )
        (?:\.|%(cents)s)?   # optional f.p. part 
    |               # or
    %(cents)s       # pure f.p. '$.01'
    )
    $               # match end of string
    """ % vars())

      



Allow:

    $ 0
    0
    $ 234
    22
    $ 0.01
    10000.12
    $ 99.90
    2,010,123
    1.00
    2,103.45
    $ .10
    $ 1.

prohibit:

    01234
    00
    123.4X
    1.001
    ...
+4


source


I found it very helpful to use kiki when setting up a regex. It visually shows what's going on with your regular expressions. This is a huge time-saver.

0


source


One difference that I see at first glance is that your regex will not find a string like:

(123,,,

      

This is because the revised version requires at least one digit between the commas. (A reasonable requirement, I would say.)

-1


source







All Articles