Convert Word table to HTML in memory

Due to the odd control of Word merge cells, it is very difficult to determine the merge of cells in Word (technically, there is no merged property in the cell, it is only a method, so we have to "guess" what is merged).

Even though I'm not perfect, I have found one way to help with cell merging. If you save the file .docx

as filtered html file

(.htm), you can look at the caption in the .htm file and see the property colspan

for each cell.

I want to avoid the costly route by first storing the .docx as .htm (using the Document.SaveAs method), then parsing the .htm to determine the colspan values ​​for each cell.

Is there any way for me to directly save the table to memory in .htm format and then pull the values ​​out of it?

An alternative might be to use an algorithm that the conversation uses to determine the colspan, but I haven't found anything on the internet about this, and I want to avoid writing a complicated algorithm if possible.


source to share

All Articles