How to check HTML using RSS?

I'm trying to speed up HTML / CSS / PHP development and was wondering how should I validate my code when it contains content that I cannot control, such as an RSS feed?

For example, my home page is a .php doc that contains HTML and PHP code. I am using PHP to create a simple RSS reader (using SimpleXML) to grab some feeds from another blog and display them on my web page.

Now, as far as possible, I would like to try and write valid HTML. So I assume this is needed to view the page in a browser (I am using NetBeans, so I click Preview Page), copy the source (using Image Source) and stick with that in the W3C validator. When I do this, I get all sorts of validation errors (eg "unable to create a system identifier for the shared object" and "the shared object" blogId "is undefined and not the default object") coming from the RSS feed.

Am I following the correct process for this? Should I just ignore all errors flagged in the RSS feed?

Thank.

+2


source to share


3 answers


In this case, when you are dealing with untrustworthy on a rogue feed, you have limited options for security.

Two that come to mind are the following:

  • use something like striptags()

    to remove all formatting from rss feed content.
  • use a library, for example, HTMLPurifier

    to check and sanitize content prior to output.

For performance, you must cache content-ready content, FYI.

-

Regarding caching



There are many ways to do this ... If you are using a framework, chances are it already has a way to do it. Zend_Cache is a class provided by the Zend map, for example.

If you have access to memcached, this is very easy. But if you don't, there are many other ways.

The general concept is to prepare the exit and then store it ready to go over and over again. This way you don't get the overhead of getting and preparing the output if it's the same every time.

Consider this code, which will fetch and format an RSS feed every 5 minutes ... All other requests are a quick command readfile()

.

# When called, will prepare the cache
function GenCache1()
{
    //Get RSS feed
    //Parse it
    //Purify it
    //Format your output
    file_put_contents('/tmp/cache1', $output);
}

# Check to see if the file is available
if(! file_exists('/tmp/cache1'))
{
    GenCache1();
}
else
{
    # If the file is older than 5 minues (300 seconds), then regen
    $a = stat('/tmp/cache1');
    if($a['mtime'] + 300 < time())
       GenCache1();
}


# Now, simply use this code to output
readfile('/tmp/cache1');

      

+4


source


I usually use HTML Tidy to clean up data from outside the system.



+1


source


RSS should always be XML compliant. Therefore, I suggest you use XHTML for your site. Since XHTML is also XML compliant, you shouldn't have any errors when validating an XHTML page with RSS.

EDIT: Of course, this only counts if the content you receive is indeed valid XML ...

0


source







All Articles