What's the best way to convert a Microsoft Word document to XHTML?

I would like to programmatically convert a Microsoft Word document to XHTML. The language of choice is PHP so I would appreciate any suggestions with PHP.

The initial idea is to convert the doc file to odt and then use the Odt2Xhtml PHP class to get it to XHTML format.

What's the best way to do this?


source to share

4 answers

If you are using Linux, one way is to install OpenOffice on the server.

Below are instructions for installing "headless" (ie no UI) here .

Then you can use a nice CLI application like unoconv executed via shell_exec to do your transformations via PHP.



The most reliable way is to use COM to have Word save the document as HTML .

I don't know if Word can generate XHTML directly; if not, Google shows many options for performing this conversion.



See http://www.codeplex.com/OpenXMLViewer which includes XSLT which you could adapt, which I did in docx4j. Note, however, that XSLT is not for the faint of heart!



phpLiveDocx offers an easy way to convert Microsoft Word documents.

Find out more on the project website:


You can also use phpLiveDocx to combine text data with MS Word templates and save the resulting document to DOC, DOCX, RTF, PDF or TXT.

The component is enterprise ready and was written for the Zend Framework.



All Articles