Extract equations and pictures from Word

Is there a programmatic way to extract equations (and possibly images) from a MS Word document? I've googled all over the place but haven't yet found anything that I can sink my teeth into and work from. If possible, I would like to be able to do this with VB.NET or C #, but I can pick up enough of any language to crack the DLL. Thank!

EDIT: I'm currently looking to extract equations from Word 2003, but if converting it to 2007 / Open XML is required, that's fine.

+4


source to share


3 answers


I don't know if that helps, but the object model in Word 2000/2003 has a collection InlineShapes

as part of an object Document

that represents inline images and possibly similar objects such as equations.

Some VBA code to copy the first item to the clipboard, which can help you extract them:



ThisDocument.InlineShapes.Items(1).Select
Selection.Copy

      

It's also available in .NET, MSDN link .

+5


source


What Word format are your documents in? If they are in Open XML (.docx file extension), you can use the Open XML SDK , available from Microsoft, to extract images and embedded content.

An Open XML file is nothing more than a zip archive using a special structure. In the SDK you will find examples on how to access parts of this archive. In fact, you can use any zip compatible library to extract content from a package of documents.

If the documents are still using the older binary format, things are a little more complicated. I think the easiest way is to convert documents to Open XML format. There are several ways to do this:



  • Get the free and open source b2xtranslator from SourceForge that offers you C # dlls for file conversion.
  • Install Microsoft Compatibility Pack and use the following command to convert:

    "C: \ Program Files \ Microsoft Office \ Office12 \ wordconv.exe" -oice -nme input_file output_file

where input_file and output_file must be fully qualified path names.

+5


source


Try looking Word-to-latex . It requires a .Net infrastructure, and while the source is not open yet, the author does raise questions about it.

0


source







All Articles