How to populate DataTable from Excel worksheet in memory

Our client receives Excel files in HTML format in Excel format. (This is how it does it, no one controls it.) Then we need to run the report based on the data in the file. Using ADO.NET (OleDbReader) results in an "External file not in expected format" exception.

If the data is converted to normal Excel format, it reads into OK. However, this is not a solution as it imposes an extra step that they have to take and they are not too computer literate at the best of times.

The only solution I could think of was to use Excel Automation to create a new spreadsheet, populate it with the same data, and read it instead. But ADO.NET apparently can only read from a file on disk. I could of course save the file and delete it when I'm done with it (which I have verified will work). But I'm embarrassed with the idea of ​​messing with my filesystem. So my first question is, is there a way to populate the DataTable from an Excel worksheet in memory?

Also, I don't like the whole business using Automation; it's incredibly slow. The operation takes over 30 seconds even without filling the DataTable. Therefore, a solution that makes it slower will not be good. This brings me to my second question - is there a better way to accomplish what I'm trying here?

+1


source to share


2 answers


Try the HTML agility package: http://www.codeplex.com/htmlagilitypack

I am using it in a similar scenario. In my case:...



  • someone pasted a table from excel to the clipboard
  • get HTML text
  • use HTML Agility to find TABLE, TR, TH, TD tags.
  • and then build the DataTable from it

At no point in my case is the HTML saved to disk

+1


source


I'm not sure what you mean by "Excel HTML format". Recent versions of Excel have an XML file format, and I Excel can open an HTML file containing a table and convert it to a worksheet, but I don't know any specific HTML to HTML format.

As for the solution using Excel Automation, once you have the worksheet in memory, you can get the values ​​in the 2D array of objects using the Value2 property and then use that to build the DataTable. I don't think this will add extra additional overhead on top of the original overhead when using Automation (which requires creating an Excel process).



Is there a better way? Parsing arbitrary HTML is not trivial, but if the files you receive are in a consistent format, they can be parsed.

0


source







All Articles