Regular expression to remove attributes and values from html tags
Hi guys I am very new to regex, you can help me with that.
I have a string like this "<input attribute='value' >"
where attribute='value'
could be anything and I want to get preg_replace
to get only<input />
How do I specify a pattern to replace any number of any characters in srting?
like this? preg_replace("/<input.*>/",$replacement,$string);
Many thanks
What do you have:
.*
will match "any character and as many as possible.
what do you mean
[^>]+
which translates to "any character that is not"> "and there must be at least one
or change,
.*?
which means "any character, but enough for this rule to work"
BUT DONT
Parsing HTML with Regular Expressions Bad
use any of the existing html parsers, DOM libraries, whatever, just NOT NAïVE REGEX
For example:
<foo attr=">">
Will be mistakenly captured by regex as
'<foo attr=" ' with following text of '">'
Which will lead to this regex:
`<[a-zA-Z]+( [a-zA-Z]+=['"][^"']['"])*)> etc etc
at this moment you will discover this beautiful gem:
<foo attr="'>\'\"">
and your head will explode.
(the syntax shortcut is checking my point and is wrong thinking, I ended up tagging.)
source to share
preg_replace("<input[^>]*>", $replacement, $string);
// [^>] means "any character except the greater than symbol / right tag bracket"
This is really basic stuff, you should catch up with some readings . :-)
source to share
If I understood the question correctly, you have the code:
preg_replace("/<input.*>/",$replacement,$string);
and you want us to tell you that you should use $ to replace what matched. *
You have to go about it the other way around. Use capture groups to capture what you want to keep and insert them in replacement. For example:.
preg_replace("/(<input).*(>)/","$1$2",$string);
Of course, you don't really need the capture groups here, since you are only inserting literal text. The rate above shows a technique if you want to do this in a situation where the tag might change. This is the best solution:
preg_replace("/<input [^>]*>/","<input />",$string);
A negative character class is more specific than a period. This regex will work if there are two HTML tags in the string. Your original regex won't.
source to share