Regex find last body tag

I know the parser is best for this situation, but in my current situation it should be just straight javascript.

I have a regex to find the closing html doc tag.

var closing_body_tag = /(<\/body>)/i;

      

However, this fails if the source has more than one set of tags. So I thought about going with something like this.

var last_closing_body_tag = /(<\/body>)$/gmi;

      

This works for the case where multiple tags are found, but for some reason it fails in cases with only one set of tags.

Am I making a mistake that might cause mixed results for single tag cases?

Yes, I understand that more than one tag tag is wrong, however we have to handle all bad source.

+3


source to share


4 answers


You can use this regex:

  /<\/body>(?![\s\S]*<\/body>[\s\S]*$)/i

      

(?![\s\S]*<\/body>[\s\S]*$)

is a lookup that ensures there is no closing body tag until the end of the line.



Here is a demon.

Example code for adding a tag:

var re = /<\/body>(?![\s\S]*<\/body>[\s\S]*$)/i; 
var str = '<html>\n<body>\n</body>\n</html>\n<html>\n<body>\n</body>\n</html>';
var subst = '<tag/>'; 
var result = str.replace(re, subst);

      

+1


source


RegExp

As I suggested in the comments, use:

/^[\S\s]+(<\/body>)/i

      

how

This will get all text (greedy) as long as the text </body>

flag is i

not case sensitive. This will work no matter how many body tags you have

</body>
</BODY>
</BoDY>
</body><!--This one selected-->

      

You said you are using JavaScript, which can be used like:

yourString.match(/^[\S\s]+(<\/body>)/i)[1];

      

.match

works great if you don't have a flag g

. For further explanation of this RegExp

Explanation



^

Matches it at the beginning of the entire line, because we don't have a flag m

[\S\s]+

will match all until next. can be replaced with +

*

(<\/body>)

will get the body tag after the previous (last) and add it as a match

i

flag i

makes the string case insensitive (remove if you want it to be case sensitive)

JavaScript appendChild

If you have multiple body tags, you can add an element in front of it.

var elem = document.createElement('div');
elem.setAttribute('id', 'mydiv');
elem.innerHTML = 'Foo';

      

Now elem

you can add in several ways:

1

window.document.body.appenedChild(elem);

      

2

var body_elems = document.getElementsByTagName('body');
body_elems[body_elems.length - 1].appendChild(elem);

      

+1


source


Using

/(.|[\r\n])*(<\/body>)/mi

      

as a regular expression. Capture group - $ 2.

This leads to greedy matching due to the multi-line option. Note that the "any char" character does not match newline / carriage returns, which requires an explicit reference.

0


source


The regex matching the last body tag is pretty simple:

/[\s\S]*(</body>)/i

      

What this does matches the many possibilities of any character (more specifically, any space or anything that is not a space) before </body>

.

The flag i

means it will match any occasion for </body>

, so nothing like this:

</body>
</BODY>
</BodY>

      

Everyone will fit.

I used [\s\S]

instead .

because it .

matches everything except newline operators, which is probably not what you want. \s

matches all spaces - spaces, tabs, all newline types - and is \s

equivalent [^\s]

, so it matches anything that is not a space. Together they correspond to every possible character. I would suggest that something like this is possible with \w\W

, \d\D

etc., but that \s\S

is my preference.

0


source







All Articles