Extract all email addresses from some .txt documents using ruby

Question

Extract all email addresses from some .txt documents using ruby

I need to extract all email addresses from some TXT documents. These letters can be in the following formats:

a@abc.com
{a, b, c}@abc.edu
some other formats including some characters @

.

I'm choosing ruby for my first language to write this program, but I don't know how to write a regex. Can anyone help me? Thank!

+1

ruby regex

Ikbear 07 jul. At 11:54

source to share

3 answers

Take a look at this rather in-depth analysis :

Upshot uses this regex:

/^([\w\!\#$\%\&\'\*\+\-\/\=\?\^\`{\|\}\~]+\.)*[\w\!\#$\%\&\'\*\+\-\/\=\?\^\`{\|\}\~]+@((((([a-z0-9]{1}[a-z0-9\-]{0,62}[a-z0-9]{1})|[a-z])\.)+[a-z]{2,6})|(\d{1,3}\.){3}\d{1,3}(\:\d{1,5})?)$/i

+2

Jonathan 07 jul. 10 at 12:03

source to share

Found this at https://www.shellhacks.com/regex-find-email-addresses-file-grep/ which suited my needs:

\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b

0

Eskim0 June 29. 17 at 12:27 am

source to share

Jan Goyvaerts · Accepted Answer · 2010-07-10T01:42:27+0000

Depending on the nature of your .txt documents, you don't need to use one of the complex regular expressions that try to validate email addresses. You are not trying to test anything. You are just trying to capture what is already there. Generally speaking, a regex to capture what might already be much easier than a regex, which has to test for input.

An important question is whether your .txt documents contain @ signs that are not part of the email address you want to extract.

This regex handles your first two requirements:

\w+@[\w.-]+|\{(?:\w+, *)+\w+\}@[\w.-]+

Or if you want to allow any sequence of nonspatial characters that contain the @ sign, plus your second requirement (which has spaces):

\S+@\S+|\{(?:\w+, *)+\w+\}@[\w.-]+

Extract all email addresses from some .txt documents using ruby

More articles: