RegEx to retrieve HTML image properties

I need a RegEx template to retrieve all properties of an image tag.

As we all know, there is a lot of wrong HTML, so the template should cover these possibilities.

I was looking at this solution on stackoverflow but it didn't quite get it:

I end up with something like:

(alt|title|src|height|width)\s*=\s*["'][\W\w]+?["']

      

Are there any features I'm missing or a more efficient simple template?

EDIT:
Sorry, I'll be more specific, I'm doing this with .NET so it's server side.
I already have a list of img tags, now I just need to parse the properties.

0


source to share


6 answers


As we all know, there is a lot of wrong HTML, so the template should cover these possibilities.



This is not true. Use an HTML parser if you need to parse "evil" (from unknown source) HTML.

+5


source


If performance isn't a big issue, I would go with an html parser (like BeautifulSoup in python) if you're doing this server-side or jquery or just javascript if you're doing it client-side. Of course, this is too much, but much faster, less likely to have errors (since they thought of cases with angles), and it will handle potential ugliness.



+1


source


Your best bet is to use something like HTML Agility Pack instead of using regex. It's designed to handle many cases and can save you more than a few headaches from knocking out edge cases.

+1


source


If you want all attribute values , may I suggest using the DOM? Something like this element.attributes

will work well.

If you insist on regex, //\b\w+="[^"]+"//

should get everything.

0


source


Before you get started with regex, see what it can do: Open RegEx tags, excluding standalone XHTML tags

0


source


/<img(\s+([a-z]{3,})=(["']([^"']*)["']|[\S]))+\s*/?>/i

      

Match_all is returned (format depends on your library, but there are key indices):

0 -> image tag
1 -> attribute
2 -> attribute name
3 -> attribute value (with enclosing quotes if exists)
4 -> attribute value (without enclosing quotes if it has them, otherwise empty, use 3)

      

0


source







All Articles