How do I extract a set of images from the srcset attribute?
Using the official spec for HTML5 srcset image candidate string , I created the following regex:
/<img[^\>]*[^\>\S]+srcset=['"](?:([^"'\s,]+)\s*(?:\s+\d+[wx])(?:,\s*)?)+["']/gm
... which must match the following tag:
<img srcset="image@2x.png 2x, image@4x.png 4x, image@6x.png 6x">
... and return these three file name ( image@2x.png
, image@4x.png
, image@6x.png
).
However, while it matches, it only returns the last one. See the Regex101 demo .
What am I doing wrong?
source to share
As you can see in this visualization, the capturing group brackets are inside the repeating pattern. This makes the regex only return the last one.
<img[^\>]*[^\>\S]+srcset=['"](?:([^"'\s,]+)\s*(?:\s+\d+[wx])(?:,\s*)?)+["']
Regexes cannot return multiple instances of the same capture group. What you need to do is grab the whole thing and then examine it further to get the individual filenames:
<img[^\>]*[^\>\S]+srcset=['"]((?:[^"'\s,]+\s*(?:\s+\d+[wx])(?:,\s*)?)+)["']
source to share