XML validation from Twitter

When I get the XML data (using a Twitter API call in this case), I suppose it's best to somehow validate it before I start working with it? Lately my application has a lot of unresolved issues and I want to exclude invalid XML data.

Is XML ever "bad"? Will an overloaded server like Twitter ever spit out half of what should come to my mind?

My real question is twofold: Should I validate XML data before I work with it, and how would I do it? (I already know the intended XML data structure)

Thank!

One final clarification before I choose an answer (and thanks for your efforts): If I only want 5 predictable fields from a static length XML file, does something like this leave loopholes that create the XSD?

if(!isset($xml->id, $xml->text, $xml->created_at, $xml->sender, $xml->recipient)) throw...

      

+2


source to share


3 answers


The most obvious method for validating your XML would be:

  • Trying to load XML into your favorite DOM, or parse it using some other mechanism (I'm not completely familiar with XML processing in PHP). This will allow you to check if the XML is well-formed. If the XML is not well-formed (i.e. you only got half of the XML response back), then you will catch this problem at this point and he.

  • Once you've successfully loaded / parsed the XML the next thing is to validate it against the XML schema. Unfortunately Twitter does not publish XML schemas for its XML so you will need to roll them yourself.

You can create your own XML schema manually. Here's a link to help you get started:

XML Schema Tutorial (W3 Schools)



You can also get tools like Altova XMLSpy that can "infer" schema from your XML. those. it gives the best idea of ​​how to define the schema, you might have to tweak it after generation. There are other free tools out there, but I've only used XMLSpy. As Alan says, if Twitter ever changes its XML format, you will need to update your schemas to accommodate those changes.

Generating XML schemas can be tricky at first, but once you get it, you'll find it pretty easy. I found this book to be excellent when I first started:

XML Schema - W3C Object Oriented Descriptions for XML (O'Reilly Press)

+2


source


To answer your question:

Input validation is one of the main parts of error handling. You should always assume that you can get bad data and then protect it as best you can.



To validate XML, you validate it against a schema (usually stored in an XSD file).

You can output the schema from an XML file. MSFT has a free tool that can do this, XSD.exe (it ships with Visual Studio) or use another third party tool. However, the downside to this is that you will need to update the schema if Twitter ever updates their format. Without a schema, you guarantee that the XML is well-formed (usually by trying to parse it), and just assume the expected data is missing and protect the code around it.

0


source


It's unfortunate that Twitter publishes the XML API, but not the schemas.

The advantage of writing your own schema is that you can code your program to handle messages that are valid according to your schema. Then, if Twitter changes its API, or if there is an undocumented function that emits a post format that you don't expect, or if you misunderstood your documentation, instead of digging into your program to figure out why it will fire, you get check error immediately. You won't necessarily know why the message is in a form you didn't expect, but at least you will know what the problem is.

0


source







All Articles