Extract all email addresses from some .txt documents using ruby

I need to extract all email addresses from some TXT documents. These letters can be in the following formats:

  • a@abc.com

  • {a, b, c}@abc.edu

  • some other formats including some characters @

    .

I'm choosing ruby ​​for my first language to write this program, but I don't know how to write a regex. Can anyone help me? Thank!

+1


source to share


3 answers


Depending on the nature of your .txt documents, you don't need to use one of the complex regular expressions that try to validate email addresses. You are not trying to test anything. You are just trying to capture what is already there. Generally speaking, a regex to capture what might already be much easier than a regex, which has to test for input.

An important question is whether your .txt documents contain @ signs that are not part of the email address you want to extract.

This regex handles your first two requirements:



\w+@[\w.-]+|\{(?:\w+, *)+\w+\}@[\w.-]+

      

Or if you want to allow any sequence of nonspatial characters that contain the @ sign, plus your second requirement (which has spaces):

\S+@\S+|\{(?:\w+, *)+\w+\}@[\w.-]+

      

+5


source


Take a look at this rather in-depth analysis :

Upshot uses this regex:



/^([\w\!\#$\%\&\'\*\+\-\/\=\?\^\`{\|\}\~]+\.)*[\w\!\#$\%\&\'\*\+\-\/\=\?\^\`{\|\}\~]+@((((([a-z0-9]{1}[a-z0-9\-]{0,62}[a-z0-9]{1})|[a-z])\.)+[a-z]{2,6})|(\d{1,3}\.){3}\d{1,3}(\:\d{1,5})?)$/i

      

+2


source


Found this at https://www.shellhacks.com/regex-find-email-addresses-file-grep/ which suited my needs:

\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b

      

0


source







All Articles