Is it possible to create a regex pattern for a comma separated list without repeating the pattern for a single element?

I'm new to regex :)

I need a regex that will match one email item or a number of comma-separated emails.

To match one letter, I wrote \b[a-zA-Z0-9._%-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}\b

. Let's call him pattern1

.

To match the list of emails I wrote something like this

"(" + pattern1 + ")([,]\\s*" + pattern1 + ")*"

But since I cannot use variables in Java annotations, I have to write something like this

(\b[a-zA-Z0-9._%-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}\b)([,]\\s*\b[a-zA-Z0-9._%-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}\b)*

which looks pretty scary.

Is it possible to rewrite my regex so I don't have to copy the single letter pattern?

Thank.

+3


source to share


3 answers


You can shorten it and improve readability by using \w

for "letters + underscores + numbers", use \d

for numbers and turn on case insensitivity so you can just use a-z

for letters:

(\b[\w.%-]+@[a-z\d.-]+\.[a-z]{2,4}\b)([,]\\s*\b[\w.%-]+@[a-z\d.-]+\.[a-z]{2,4}\b)*

      

and you can also shorten it further (comma and spaces are optional):



((,\\s*)?\b[\w.%-]+@[a-z\d.-]+\.[a-z]{2,4}\b)+

      

some code for demonstration (using regex capture groups):

    //c#    
    string input = @"sdf.an@dfgdfg.com, sdfsdf@fdfd.erff";
    var matches = Regex.Matches(input, @"((?:,\s*)?(\b[\w.%-]+@[a-z\d.-]+\.[a-z]{2,4}\b))");
    string result = "matches:\n";
    for (int i = 0; i < matches.Count; i++)
    {
        result += "match " + i + ",value:" + matches[i].Groups[2].Value + "\n";
    }
    Console.WriteLine(result);

      

+2


source


There's a great overview of how to match an email address in RE here . This might be when you got the regex in your question.

Despite the balance between "perfect" and "practical" matches, you may also need to consider valid comments in addresses. For example, the following equivalents:

  • user@example.com
  • "User Joe" user@example.com
  • < user@example.com > User Joe


That is, these are all valid entries for the To / From / CC / BCC / Reply-To line in the message. If you are confident that your comma-separated list will not contain comment parts, you do not have to worry about this.

However, your "scary" regular expression looks straight to me. And trust me, once you become more comfortable with regular expressions, it doesn't look so scary. Add support for commenting parts of addresses and it can be a little tricky ... :-)

+1


source


If you maintain your flavor of regex you can use subset calls like:

(foobar)(?,(?-1))*

      

Or more verbally as:

(?x)
(?(DEFINE)
    (?<foo> foobar )
)
(?&foo) (?: , (?&foo) )*

      

If the match is anchored to the beginning and end of the string, you can also use:

^(?:foobar(?:$|,(?!$)))+$

      

or

^(?:(?:^|(?!^),)foobar)+$

      

0


source







All Articles