How do I extract a set of images from the srcset attribute?

Using the official spec for HTML5 srcset image candidate string , I created the following regex:

/<img[^\>]*[^\>\S]+srcset=['"](?:([^"'\s,]+)\s*(?:\s+\d+[wx])(?:,\s*)?)+["']/gm

      

... which must match the following tag:

<img srcset="image@2x.png 2x, image@4x.png 4x, image@6x.png 6x">

      

... and return these three file name ( image@2x.png

, image@4x.png

, image@6x.png

).

However, while it matches, it only returns the last one. See the Regex101 demo .

What am I doing wrong?

+3


source to share


1 answer


As you can see in this visualization, the capturing group brackets are inside the repeating pattern. This makes the regex only return the last one.

<img[^\>]*[^\>\S]+srcset=['"](?:([^"'\s,]+)\s*(?:\s+\d+[wx])(?:,\s*)?)+["']

      

Regular expression visualization

Demo Debuggex



Regexes cannot return multiple instances of the same capture group. What you need to do is grab the whole thing and then examine it further to get the individual filenames:

<img[^\>]*[^\>\S]+srcset=['"]((?:[^"'\s,]+\s*(?:\s+\d+[wx])(?:,\s*)?)+)["']

      

Regular expression visualization

Demo Debuggex

+5


source







All Articles