Imagine the following.
- Html parsed into dom tree
- Dom Nodes become available programmatically
- Dom Nodes may or may not be complemented by software
- The added nodes are re-serialized to html.
My first question is how the "script" tag should be maintained .
my $tree = someparser( $source );
....
print $somenode->text();
$somenode->text('arbitraryjavascript');
....
print $tree->serialize();
Or for that.
The problem comes when you decide how to properly treat the contents of this field in terms of ease of use and portability / usability of its outliers.
What I want to do myself is:
$somenode->text("verbatim");
->
<script>
//
</script>
So what I produce is both safe and reliable.
But I am hesitant if this magic is a good idea and should I have code that tries to detect existing copies of the "safety blocks" and replace / split them in the "parsing" phase.
If I don't separate it from the input, I am probably going to double the output phase, especially problematic if the output of this code is subsequently re-parsed.
If I remove it from the input, it has a useful effect that programmatically outputs the content of the script element, won't see the security blocks at both ends.
There will eventually be a way to change some of this behavior, but the question is what / default / should be the way to handle this and why.
Perhaps my whole reasoning is screwed up here, and the content of the text should be completely unprocessed, unless it gets processed.
What kind of behavior are you looking for in such a tool? Please point out anything in the reasoning that I may have forgotten.
TL; DR Summary:
How should I programmatically handle the escaping mechanism in these scenarios, namely: " " with padding to either end regarding I / O//<!--<![CDATA[
javascript
dom
html
serialization
parsing
Kent fredric
source
to share