RegEx to retrieve HTML image properties
I need a RegEx template to retrieve all properties of an image tag.
As we all know, there is a lot of wrong HTML, so the template should cover these possibilities.
I was looking at this solution on stackoverflow but it didn't quite get it:
I end up with something like:
(alt|title|src|height|width)\s*=\s*["'][\W\w]+?["']
Are there any features I'm missing or a more efficient simple template?
EDIT:
Sorry, I'll be more specific, I'm doing this with .NET so it's server side.
I already have a list of img tags, now I just need to parse the properties.
source to share
If performance isn't a big issue, I would go with an html parser (like BeautifulSoup in python) if you're doing this server-side or jquery or just javascript if you're doing it client-side. Of course, this is too much, but much faster, less likely to have errors (since they thought of cases with angles), and it will handle potential ugliness.
source to share
Your best bet is to use something like HTML Agility Pack instead of using regex. It's designed to handle many cases and can save you more than a few headaches from knocking out edge cases.
source to share
Before you get started with regex, see what it can do: Open RegEx tags, excluding standalone XHTML tags
source to share
/<img(\s+([a-z]{3,})=(["']([^"']*)["']|[\S]))+\s*/?>/i
Match_all is returned (format depends on your library, but there are key indices):
0 -> image tag
1 -> attribute
2 -> attribute name
3 -> attribute value (with enclosing quotes if exists)
4 -> attribute value (without enclosing quotes if it has them, otherwise empty, use 3)
source to share