Is there a html css normalizer that works?

A long time ago I wrote a style "normalizer" program to scan ASP / HTML code of a large bunch of classic ASP pages (most of which were originally created from MS-Word documents, so naturally they were littered with style captions and massive one-off styles). The style normalizer generated a minimal set of styles and styles and a new "processed" asp / html document, so that the sanitized document produced exactly the same rendered result as the original (verified by comparing screenshots).

From time to time I am faced with the need for such a program, and I am going with the idea of ​​writing it for a commercial release.

I didn't have anything like this (HTML: Normalize Perl Module and HTML Tidy Project just clear the tags).

So my questions are:

  • Is there such a tool already, commercial or otherwise?
  • If not, does he really need it?
  • if so, what features would make it really useful?

re # 3, for example, collecting a base style sheet for a set of pages or setting all pages to use a given base style sheet; keeping the classic asp commands after #includes, keeping the inline asp.net scripts, etc. The more specific and numerous the better.

Example:
Old html with inline tags

<html><head>
<title>title</title>
<style type='css/text'>
.cls1 { font-family: arial; font-size: 10px; font-weight: bold; }
</style>
</head>
<body>
<% somefunction() %>
<div class='cls1' style='font-size:10px;'>test div</div>
</body>
</html>

      

New html

<html><head>
<title>title</title>
<style type='css/text'>
.cls1 { font-family: arial; font-size: 10px; font-weight: bold; }
</style>
</head>
<body>
<% somefunction() %>
<div class='cls1'>test div</div>
</body>
</html>

      

Note that there is no styling in the div as it was redundant with the cls1 class

EDIT: Remove the term "sanitizer" since I'm not focusing on XSS attacks or filtering input in comments, just consolidating lots of custom styles and random CSS classes into a minimal consistent set of stylesheets.

+1


source to share


4 answers


Well, I can't say definitively that it "works" for all of this, but Tidy does a little more than clearing tags.



See HTML Tidy Settings , especially those specific to Microsoft Word (e.g. word-2000 )

+3


source


If you want to know if you've done a reasonable job, you should try these tests (using something like Tidy, you probably haven't done a reasonable job).

Some parameters:



Anything that uses regular expressions and doesn't parse markup would be suspicious in my mind (and just too hard to implement).

+2


source


Old question, but some people may still find this useful. Check out http://necolas.github.com/normalize.css/ . It works well!

+1


source


0


source







All Articles