RegEx to match end of line
I am looking for a match for email addresses in a text document for which I am writing a regex. I came up with something like this for beginners -
((?:[a-zA-Z]+[\w+\.\-]+[\-a-zA-Z]+))[ ]*((?:@|at))[ ]*(?:[a-zA-Z\.]+)
I want to make sure the end of the email address is "edu" or "com". How should I do it? I am using Python.
Some examples of email addresses from my text document
alice @ so.edu
alice at sm.so.edu
alice @ sm.com
Edit -
I only want to change this regex. My regex matches other examples in my data.
((?:[a-zA-Z]+[\w+\.\-]+[\-a-zA-Z]+))[ ]*((?:@|at))[ ]*(?:[a-zA-Z\.]+)\.(com|edu)
EDIT : for "dot" instead of ".":
((?:[a-zA-Z]+[\w+\.\-]+[\-a-zA-Z]+))[ ]*((?:@|at))[ ]*(?:[a-zA-Z\.]+) *(\.|dot) *(com|edu)
First of all, see this answer for an explanation of how to match all valid email addresses as per RFC822.
I personally would not change the regexp, but instead use email.Utils.parseaddr()
regexp for matches and check that the resulting string is .endswith("edu")
or .endswith("com")
. For example.
>>> email.Utils.parseaddr("kimvais@mailinator.com")[1].endswith(".com")
True