CAtlRegExp for regex that matches 4 characters max

Short version:

How do I get a regex that matches a@a.aaaa but not a@a.aaaaa using CAtlRegExp ?


Long version:

I am using CAtlRegExp http://msdn.microsoft.com/en-us/library/k3zs4axe(VS.80).aspx to try and match email addresses. I want to use regex

^[A-Z0-9._%+-]+@(?:[A-Z0-9-]+\.)+[A-Z]{2,4}$

      

retrieved from here . But the syntax that CAtlRegExp accepts is different from the one used there. This regex returns REPARSE_ERROR_BRACKET_EXPECTED error, you can check for yourself using this app: http://www.codeproject.com/KB/string/mfcregex.aspx

Using the above application, I created this regex:

^[a-zA-Z0-9\._%\+\-]+@([a-zA-Z0-9-]+\.)+[a-zA-Z]$

      

But the problem is that this matches a@a.aaaaa as valid, I need it to match the maximum 4 characters for the op-level domain.

So how can I get a regex that matches a@a.aaaa but not a@a.aaaaa ?

+2


source to share


2 answers


Try: ^[a-zA-Z0-9\._%\+\-]+@([a-zA-Z0-9-]+\.)+

\c\c\c?\c?$

This expression replaces a sequence [A-Z]{2,4}

that CAtlRegExp does not support with\c\c\c?\c?



\c

serves as an abbreviation [a-zA-Z]

. The question marks after the 3rd and 4th \c

indicate that they can match zero or one character. As a result, this part of the expression matches 2, 3, or 4 characters, but nothing more and nothing less.

+2


source


You are trying to match email addresses, a very widely used critical element of Internet communication.

Why would I say this job is best done with the most widely used most correct regular expression.

Since the rules for the format of email addresses are described in RFC822, it seems useful to do an internet search for something like "RFC822 email regex".

For Perl, the answer seems easy: use Mail :: RFC822 :: Address: address validation based on regex



RFC 822 Email Address in Parser in PHP

Thus, to achieve the most correct handling of email addresses, you either need to find the most accurate regex that is available somewhere for a particular toolkit (ATL in your case), or - if there is no suitable existing regexp - adapt a very precise regex of another toolkit (Perl above seems to be a very complete, albeit difficult candidate).

If you are trying to match a specific portion of email addresses (as it seems, for example, given your question), then it probably still makes sense to start with the most modern / correct / generic regex and in particular limit it to the required portions.

I may have said the obvious, but I hope this helped.

+1


source







All Articles