How do you prefer to handle <script> elements?

Imagine the following.

  • Html parsed into dom tree
  • Dom Nodes become available programmatically
  • Dom Nodes may or may not be complemented by software
  • The added nodes are re-serialized to html.

My first question is how the "script" tag should be maintained .

my $tree = someparser( $source ); 
....
print $somenode->text(); 
$somenode->text('arbitraryjavascript');
....
print $tree->serialize(); 

      

Or for that.

The problem comes when you decide how to properly treat the contents of this field in terms of ease of use and portability / usability of its outliers.

What I want to do myself is:

 $somenode->text("verbatim"); 

      

->

  <script>
  // <!-- <![CDATA[ 
  verbatim
  // ]]> -->
  </script>

      

So what I produce is both safe and reliable.

But I am hesitant if this magic is a good idea and should I have code that tries to detect existing copies of the "safety blocks" and replace / split them in the "parsing" phase.

If I don't separate it from the input, I am probably going to double the output phase, especially problematic if the output of this code is subsequently re-parsed.

If I remove it from the input, it has a useful effect that programmatically outputs the content of the script element, won't see the security blocks at both ends.

There will eventually be a way to change some of this behavior, but the question is what / default / should be the way to handle this and why.

Perhaps my whole reasoning is screwed up here, and the content of the text should be completely unprocessed, unless it gets processed.

What kind of behavior are you looking for in such a tool? Please point out anything in the reasoning that I may have forgotten.


TL; DR Summary: How should I programmatically handle the escaping mechanism in these scenarios, namely: " " with padding to either end regarding I / O//<!--<![CDATA[



+1


source to share


2 answers


I'm adding my own answer here, so it's more obvious what I'm trying to figure out. The current idea that I have identified will run like this:

my $html=<<'EOF'
<script>
//<!--<![CDATA[
foo
//]]>-->
</script>
EOF
#/# this line is here for the syntax highlighter
my $obj = parse($html); 
print $obj->text(); 
# foo
$obj->text("bar");
print $obj->text(); 
# bar
print $obj->html(); 
# <script>
# //<!--<![CDATA[
# bar
# //]]>-->
# </script>

      

The important points are:

  • The xml / html / legacybrowser / bot security mechanisms are removed for internal code representation.
  • This way, inline code can be manipulated as if it wasn't there.
  • Re-exporting the modified code again forces the protection mechanisms.


if it was

  • No defense mechanisms
  • Miscellaneous (i.e.: no // or no <!-

    or no <!

    ) parts

the existing defenses will be removed and replaced with the swamp standard above.

+1


source


The only thing I can remember is the ASP.NET register script block functions. They all have an overload that takes a bool for whether script tags need to be added or not.

Here's a docs link for one:



http://msdn.microsoft.com/en-us/library/bahh2fef.aspx

0


source







All Articles