Regular expression to remove attributes and values ​​from html tags

Hi guys I am very new to regex, you can help me with that.

I have a string like this "<input attribute='value' >"

where attribute='value'

could be anything and I want to get preg_replace

to get only<input />

How do I specify a pattern to replace any number of any characters in srting?

like this? preg_replace("/<input.*>/",$replacement,$string);

Many thanks

0


source to share


4 answers


What do you have:

.*

      

will match "any character and as many as possible.

what do you mean

[^>]+

      

which translates to "any character that is not"> "and there must be at least one

or change,

.*?

      

which means "any character, but enough for this rule to work"

BUT DONT

Parsing HTML with Regular Expressions Bad



use any of the existing html parsers, DOM libraries, whatever, just NOT NAïVE REGEX

For example:

 <foo attr=">"> 

      

Will be mistakenly captured by regex as

'<foo attr=" ' with following text of '">' 

      

Which will lead to this regex:

 `<[a-zA-Z]+( [a-zA-Z]+=['"][^"']['"])*)>  etc etc 

      

at this moment you will discover this beautiful gem:

 <foo attr="'>\'\"">

      

and your head will explode.

(the syntax shortcut is checking my point and is wrong thinking, I ended up tagging.)

+10


source


Some people were close ... but not 100%:

It:

preg_replace("<input[^>]*>", $replacement, $string);

      



should be as follows:

preg_replace("<input[^>]*?>", $replacement, $string);

      

You don't want this to be a greedy match.

+1


source


preg_replace("<input[^>]*>", $replacement, $string); 
// [^>] means "any character except the greater than symbol / right tag bracket"

      

This is really basic stuff, you should catch up with some readings . :-)

0


source


If I understood the question correctly, you have the code:

preg_replace("/<input.*>/",$replacement,$string);

      

and you want us to tell you that you should use $ to replace what matched. *

You have to go about it the other way around. Use capture groups to capture what you want to keep and insert them in replacement. For example:.

preg_replace("/(<input).*(>)/","$1$2",$string);

      

Of course, you don't really need the capture groups here, since you are only inserting literal text. The rate above shows a technique if you want to do this in a situation where the tag might change. This is the best solution:

preg_replace("/<input [^>]*>/","<input />",$string);

      

A negative character class is more specific than a period. This regex will work if there are two HTML tags in the string. Your original regex won't.

0


source







All Articles