Open Excel, Parse Data?

(Alternative title: ReadAllLines Analogue for Excel: What's the best way to load and manipulate Excel data?)

I would like to quickly hack an excel sheet and do text manipulations. I want the operation to work like ReadAllLines ( https://msdn.microsoft.com/en-us/library/s2tte0y1(v=vs.110).aspx ) but for Excel.

I found the following question that stands still, but seven years old. Reading Excel files from C # (Also, this is a historically important question that is frozen. Moreover, I do not have 50 points, so I cannot comment on this open.) I cut and pasted Robin Robinson in Visual Studio, changing only the path:

var fileName = string.Format("{0}\\fileNameHere", Directory.GetCurrentDirectory());
var connectionString = string.Format("Provider=Microsoft.Jet.OLEDB.4.0; data source={0}; Extended Properties=Excel 8.0;", fileName);

var adapter = new OleDbDataAdapter("SELECT * FROM [workSheetNameHere$]", connectionString);
var ds = new DataSet();

adapter.Fill(ds, "anyNameHere");

var data = ds.Tables["anyNameHere"].AsEnumerable();

var query = data.Where(x => x.Field<string>("id") != string.Empty).Select(x =>
            new 
            {
                id= x.Field<string>("id"),
            });

      

This does not work when using the Fill method with "External table not in expected format".

Are good programmers doing this today? that is, should I research Jet and see if there are updates for the latest Excel, or is there a new and improved way?

-1


source to share


2 answers


It's lightning fast; I cannot say that I am not doing text manipulation with ReadAllLines and Regex. (Removed text processing details).

How I got it to work is explained in the comments on the original question.

Not sure why this is so faster than Interop.Excel. Inefficient coding? Better API?



Any insight is appreciated!

        var path = string.Format(@"C:\Users\jlambert\Desktop\encryptedSSNs.xlsx");
        var connStr = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + path + ";Extended Properties=Excel 12.0;";


        var adapter = new OleDbDataAdapter("SELECT * FROM [sheetName$]", connStr);
        var ds = new DataSet();

        adapter.Fill(ds, "anyNameHere");

        var data = ds.Tables["anyNameHere"].AsEnumerable();

        var query = data.Where(x => x.Field<string>("MRN") != string.Empty).Select(x =>
            new 
            {
                mrn = x.Field<string>("MRN"),
                ssn = x.Field<string>("ssn"),
            });

        foreach (var q in query)
        {
            Console.WriteLine(q);    
        }
        Console.ReadLine();

      

0


source


I combined a prototype based on Jon Skeet's recommendation to use Interop and juliusz's answer on How do I import from Excel to DataSet using Microsoft.Office.Interop.Excel?

However, there are two things that I really don't like: First, I feel like I have to enter all the column names. Second, it loads the entire spreadsheet into memory rather than reading it as a stream (takes a minute even for relatively small spreadsheets.



private static void Main(string[] args)
{
    var dataT = Import(@"C:\Users\jlambert\Desktop\dSmall_encrypted.xlsx");

    var data = dataT.AsEnumerable();
...
}
public static System.Data.DataTable Import(String path)
{
    var app = new Application();
    Workbook workBook = app.Workbooks.Open(path, 0, true, 5, "", "", true, XlPlatform.xlWindows, "\t", false, false, 0, true, 1, 0);

    Worksheet workSheet = (Worksheet)workBook.ActiveSheet;

    int index = 0;
    object rowIndex = 2;

    System.Data.DataTable dt = new System.Data.DataTable();
    dt.Columns.Add("Facility_code");                    
    dt.Columns.Add("MRN");
    dt.Columns.Add("first_name");
    dt.Columns.Add("middle_name");
    dt.Columns.Add("last_name");
    dt.Columns.Add("address_line_1");
    dt.Columns.Add("address_line_2");
    dt.Columns.Add("city");
    dt.Columns.Add("state");
    dt.Columns.Add("zip");
    dt.Columns.Add("date_of_birth");
    dt.Columns.Add("gender");
    dt.Columns.Add("ssn");
    dt.Columns.Add("home_phone");
    dt.Columns.Add("work_phone");
    dt.Columns.Add("cell_phone");
    dt.Columns.Add("PCP");
    dt.Columns.Add("Practice Location");

    DataRow row;

    while (((Range)workSheet.Cells[rowIndex, 1]).Value2 != null)
    {
        rowIndex = 2 + index;
        row = dt.NewRow();
        row[0] = Convert.ToString(((Range)workSheet.Cells[rowIndex, 1]).Value2);
        row[1] = Convert.ToString(((Range)workSheet.Cells[rowIndex, 2]).Value2);
        row[2] = Convert.ToString(((Range)workSheet.Cells[rowIndex, 3]).Value2);
        row[3] = Convert.ToString(((Range)workSheet.Cells[rowIndex, 4]).Value2);
        row[4] = Convert.ToString(((Range)workSheet.Cells[rowIndex, 5]).Value2);
        row[5] = Convert.ToString(((Range)workSheet.Cells[rowIndex, 6]).Value2);
        row[6] = Convert.ToString(((Range)workSheet.Cells[rowIndex, 7]).Value2);
        row[7] = Convert.ToString(((Range)workSheet.Cells[rowIndex, 8]).Value2);
        row[8] = Convert.ToString(((Range)workSheet.Cells[rowIndex, 9]).Value2);
        row[9] = Convert.ToString(((Range)workSheet.Cells[rowIndex, 10]).Value2);
        row[10] = Convert.ToString(((Range)workSheet.Cells[rowIndex, 11]).Value2);
        row[11] = Convert.ToString(((Range)workSheet.Cells[rowIndex, 12]).Value2);
        row[12] = Convert.ToString(((Range)workSheet.Cells[rowIndex, 13]).Value2);
        row[13] = Convert.ToString(((Range)workSheet.Cells[rowIndex, 14]).Value2);
        row[14] = Convert.ToString(((Range)workSheet.Cells[rowIndex, 15]).Value2);
        row[15] = Convert.ToString(((Range)workSheet.Cells[rowIndex, 16]).Value2);
        row[16] = Convert.ToString(((Range)workSheet.Cells[rowIndex, 17]).Value2);
        row[17] = Convert.ToString(((Range)workSheet.Cells[rowIndex, 18]).Value2);
        index++;
        dt.Rows.Add(row);
    }
    app.Workbooks.Close();
    return dt;
}

      

0


source







All Articles