Regular expression in basic R regex to identify email address

I am trying to use the stringr library to extract emails from a large messy file.

str_match doesn't allow perl = TRUE and I can't figure out the escape characters to get it to work.

Can anyone recommend a relatively reliable regex that will work in the context below?

c("larry@gmail.com", "larry-sally@sally.com", "larry@sally.larry.com")->emails
"SomeRegex"->regex
str_match(emails, regex)

      

+1


source to share


2 answers


> "^[[:alnum:].-_]+@[[:alnum:].-]+$"->regex
> str_match(emails, regex)
     [,1]                   
[1,] "larry@gmail.com"      
[2,] "larry-sally@sally.com"
[3,] "larry@sally.larry.com"

      

@ -sign doesn't need to be accelerated in regex. And "." and "-" are not special in character classes. If you want to add a requirement for ".com", ". Co", ".edu", ".org" then you have to specify how the complete list should be.



As stated in M42, this is not the correct method. It is actually claimed that there is no sure method: Using a regex to validate an email address

+3


source


I found this regex works better for me:

^[[:alnum:]._-]+@[[:alnum:].-]+$

      



Dash has special meaning in a character class if it is not the last character. This is a range operator as in "AZ"

0


source







All Articles