File Format Conversion Library Suggestions

  • conversion from several non-graphic document formats to HTML and vice versa (e.g. doc โ†” HTML, pdf โ†” html, odt โ†” html, etc.).
  • command line or API (Java API preferred)
  • cross platform
  • commercial or open source

Are there any known solutions that meet / exceed these requirements?

+1


source to share


3 answers


OpenOffice has a rich API that supports conversion between various supported formats. Go to this question. He recommends using JODConverter .



+2


source


With DocBook, you can export to a variety of output formats, but the return is always heavy. In pdf, you can try iText



0


source


I (having written everything in one text Tex / LaTeX -> HTML and ASCII and RTF converter) would say it would be a pretty serious deal.

The problem with this, these different document formats are meant for completely different purposes. And while there are indeed such conversion tools between some of these formats, there is often a conceptual inconsistency in the structure, meaning and implementation of a "document", and very often it is necessary to comment on the functions supported by one format to crack an acceptable output in another. For example, PDF is very strong in presentation, accurate placement and font support, where since HTML is more concerned with structure, it barely takes these things into account (no CSS).

I'm wondering how are you supposed to use an API like this when normally someone just needs a conversion program?

0


source







All Articles