What is the best way to generate xslx file on a website? Perhaps with millions of lines?

I have been tasked with writing a solution to fix a poorly executable outdated excel file generator.

The files that I need to create can be very large. Possibly up to a million rows with 40-50 columns. I guess I'll pass it on to the user if possible, but I just need to save the file to disk first and then create a link for the user.

I am trying to do a performance test, testing if I can create an xslx file with 1.500.000 rows and 50 columns, each cell containing a random 10 letter string ... Would it even handle large files?

Note: In reality, most files generated will never be larger than 300,000 lines, and the absolute maximum is around 950,000 lines, but I like to play it safe when stress testing 1.5M lines in this way.

Do you have any suggestions on how I should tackle this problem. Are there any components I should be aware of? Limitations in excel?

PS: I would appreciate it if I didn't have to install Excel on the server.

+2


source to share


8 answers


The limit on the number of rows you can have in a spreadsheet ( 1M for Office 2007 ). Instead, I would generate a file



+5


source


While I cannot answer the maximum amount of data that Excel can handle, if you are using the new .xlsx format you are using MS OpenXML format. The .xlsx file is actually a zip compressed file with all the documents stored inside. XML can be written just like any other XML, but you have to rethink the standards. There are several commercial components for this. You don't need Excel to write the format.

Here are some useful links:



  • Office OpenXML - Wikipedia
  • Office Open XML C # Library - This looks like an open source library to read / write OpenXML
  • Reading and Writing Open XML Files - CodeProject - Another implementation of the R / W library
  • GemBox.Spreadsheet is a commercial .NET component for reading / writing office spreadsheets. Has a free version with limits on the number of lines you can read and write if you want to try it out.
  • NPOI Library is an implementation of the Java POI library for reading and writing office documents.
  • Simple OOXML - "A set of helper classes to simplify the creation of Open Office XML documents. Uses the Open Office SDK v 2.0. Modify or create any .docx or .xlsx document without Microsoft Word or Microsoft Excel."
+2


source


Excel 2007 supports a maximum sheet size of 1,048,576 rows by 16,384 columns, so your test with 1.5 million rows may not be possible. Source

Edit: Excel 2003 supports fewer rows: 65,536 rows by 256 columns. Source

If you can require your users to open Excel 2007 (xlsx) documents, this might be your best bet as it is just an XML document and can be generated without any Excel server requirement.

If you need to support "all" versions of excel / other office suite programs, you should probably use CSV or another delimited format.

Open Document Format might also be of interest, but excel users will need an ODF add-in to consume documents.

Edit 2: If you're looking at using CSV, you can look in the FileHelpers library .

+2


source


Make sure your tests are representative of the actual data. Excel handles simple number cells much more efficiently than simple text cells, especially when all text cells are unique. Thus, if your data is indeed 10 unique characters, be sure to use this as a test case. If in fact it will be mostly a number, make sure your tests reflect this fact.

For example. I built a simple test using SpreadsheetGear for .NET to create an Open XML (.xlsx) column workbook of 300,000 lines of 50. It took 13.62 seconds to create and save to disk with unique numbers on my nearly two year old overclocked processor QX6850 while creating and saving 300,000 rows by 50 columns. An xlsx worksheet with 10 characters of unique strings took 78 seconds - 6 times longer for text than for numbers. I'll paste the code below and you can run it with a free SpreadsheetGear trial that you can download here .

It is important to note that Open XML (.xlsx) is compressed, so if your data has a lot of redundancy, you will likely end up with smaller .xlsx files than .csv files. This can have a big performance impact if you are creating books on a web server for consumption over the network.

SpreadsheetGear, using the IWorkbook.SaveToStream method and most other third-party Excel compatible libraries, will allow you to directly save the response stream in an ASP.NET application, so you can avoid saving it to disk on the server.

Disclaimer: I own SpreadsheetGear LLC

Here's the test code:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using SpreadsheetGear;

namespace ConsoleApplication11
{
    class Program
    {
        static void Main(string[] args)
        {
            var timer = System.Diagnostics.Stopwatch.StartNew();
            int rows = 300000;
            int sheets = 1;
            var workbook = Factory.GetWorkbook();
            var sb = new System.Text.StringBuilder();
            int counter = 0;
            bool numeric = true;
            for (int sheet = 0; sheet < sheets; sheet++)
            {
                // Use the SpreadsheetGear Advanced API which is faster than the IRange API.
                var worksheet = (sheet == 0) ? workbook.Worksheets[0] : workbook.Worksheets.Add();
                var values = (SpreadsheetGear.Advanced.Cells.IValues)worksheet;
                for (int row = 0; row < rows; row++)
                {
                    for (int col = 0; col < 50; col++)
                    {
                        if (numeric)
                            values.SetNumber(row, col, ++counter);
                        else
                        {
                            sb.Length = 0;
                            // Make a 10 character unique string.
                            sb.Append(++counter);
                            System.Diagnostics.Debug.Assert(sb.Length <= 10);
                            // Make it 10 characters long.
                            while (sb.Length < 10)
                                sb.Append((char)('A' + (char)sb.Length));
                            values.SetText(row, col, sb);
                        }
                    }
                }
            }
            Console.WriteLine("Created {0} cells in {1} seconds.", counter, timer.Elapsed.TotalSeconds);
            workbook.SaveAs(@"C:\tmp\BigWorkbook.xlsx", FileFormat.OpenXMLWorkbook);
            Console.WriteLine("Created and saved {0} cells in {1} seconds.", counter, timer.Elapsed.TotalSeconds);
        }
    }
}

      

+2


source


You might want to check out the NPOI library for reading and writing excel files at http://npoi.codeplex.com/ . As for saving to the server, which is an option, but remember that you will have to clean up the files after downloading them.

+1


source


Take a look at the Simple OOXML project at Codeplex.

This might be what you are looking for.

PS. Excel is basically spreadsheet software, without having to replace the database. Are you sure you want to dump a million lines for the end user?

+1


source


Excel cannot handle millions of lines, try creating a CSV output file instead, this can be read in Excel.

And it is not recommended to add a huge amount of data to excel at the request of the user He will need to wait until the file is loaded.

+1


source


Assuming you can avoid exceeding the newline limits in Excel 2007 (by splitting into other sheets or files), the Excel xlsx format should work fine.

Since XLSX is a zip format and not an in-memory creation or writing to disk, you should consider writing directly to an in-memory zip stream. Compression will keep memory usage low, and writing to the filesystem will not help.

Another potential solution depending on your circumstance: create a blank access template, copy and write to it, and send it instead of the Excel file. Of course, this will be a transition for your application, but Access won't have the same row limit.

+1


source







All Articles