Extract from multiple Excel files into one data table or file

I have over 100 .xlsx files. Each file has two sheets. The first sheet (always called sts) usually contains 15-20 thousand rows with a "Code" column. The second sheet (always called cps) contains about 85k lines, also with the same column of code.

I need to extract all rows for specific code from sts table, to table / sheet, and all rows for specific code from cps sheet to second table / sheet. I need to do this for all files.

I have experimented with two approaches

1) Use Excel VBA to open each file, use autofilter to copy the lines of code needed to the main workbook for matching. Using the following code, you will get the files from the predefined home directory and deploy Public Sub SearchFiles()

.

Public Sub SearchFiles()

'Macro to start the file extraction by drilling down from the mydir path specified
Dim code As String
Dim time1 As Double
Dim time2 As Double

Range("a1").Value = InputBox("Please type code to extract", code)
time1 = Timer

myFileSearch _
myDir:="C:\Data\Dashboard\2014\New Files Excel Loop", _
FileNameLike:="Reporting", _
FileTypeLike:=".xlsx", _
SearchSubFol:=True, _
myCounter:=0

time2 = Timer
MsgBox time2 - time1 & "seconds"

End Sub


Private Sub myFileSearch(myDir As String, FileNameLike As String, FileTypeLike As String, _
  SearchSubFol As Boolean, myCounter As Long)
Dim fso As Object, myFolder As Object, myFile As Object
Dim Rowcount As Long
Dim rowcount2 As Long
Dim masterbook As Workbook
Set masterbook = ThisWorkbook
Set fso = CreateObject("Scripting.FileSystemObject")
Dim commodity As String

code = Range("a1").Value

Application.ScreenUpdating = False

For Each myFile In fso.GetFolder(myDir).Files
    Workbooks.Open (myDir & "\" & myFile.Name)
    myCounter = myCounter + 1
    ReDim Preserve myList(1 To myCounter)
    myList(myCounter) = myDir & "\" & myFile.Name

    ''loop to pull out all code rows in your directories into new file
    Workbooks(Workbooks.Count).Worksheets(1).Range("d2").Activate
    Rowcount = Workbooks(1).Sheets(1).Range("a1").CurrentRegion.Rows.Count + 1
    Rows(1).AutoFilter
    Range("A1").AutoFilter Field:=3, Criteria1:=code, Operator:=xlAnd
    Range("A1").CurrentRegion.SpecialCells(xlCellTypeVisible).Copy _
    Destination:=Workbooks(1).Sheets(1).Range("a" & Rowcount)

    'filter out the code data
    Workbooks(Workbooks.Count).Worksheets(2).Activate
    Range("d2").Activate
    rowcount2 = Workbooks(1).Sheets(2).Range("a1").CurrentRegion.Rows.Count + 1
    Rows(1).AutoFilter
    Range("A1").AutoFilter Field:=6, Criteria1:=code, Operator:=xlAnd
    Range("A1").CurrentRegion.SpecialCells(xlCellTypeVisible).Copy _
      Destination:=Workbooks(1).Sheets(2).Range("a" & Rowcount)

    Workbooks(myFile.Name).Close savechanges:=False
Next

If SearchSubFol Then
    For Each myFolder In fso.GetFolder(myDir).SubFolders
        myFileSearch myDir & "\" & myFolder.Name, FileNameLike, FileTypeLike, True, myCounter
    Next
End If

End Sub

      

Each book takes 5-10 seconds to open and the whole process is very slow (and with errors at the moment).

2) Import everything into two Access tables and then clean up only the lines of code I need. This is slower than Excel's approach due to the number of rows.

Sub pulloop()

DoCmd.RunSQL "delete * from sts"
DoCmd.RunSQL "delete * from cps"

strSql = "PathMap"
Set rs = CurrentDb.OpenRecordset(strSql)

With rs

    If Not .BOF And Not .EOF Then
        .MoveLast
        .MoveFirst

        While (Not .EOF)
            importfile = rs.Fields("Path")

            DoCmd.TransferSpreadsheet acimport, acSpreadsheetTypeExcel12, "Sts", importfile, True, "Sts!A:G"

           DoCmd.TransferSpreadsheet acimport, acSpreadsheetTypeExcel12, "CPs", importfile, True, "CPs!A:Q"
            'Debug.Print rs.Fields("Path")
            .MoveNext
        Wend

    End If

    .Close

End With

End Sub

      

I adapted this to try and use AcLink, but I am struggling to implement it. Would it be possible to use aclink rather than acimport to query the required lines of code of each file when it enters Access, and if so, perhaps a faster way?

+3


source to share


2 answers


It looks like one of the problems with the second option that I am leaning towards is that you are importing ALL rows from an Excel file. Try using the Excel Object Model to define the named range on both sheets, and then use the docmd.transferspread table in a loop. You will need to change the ref column for a different sheet. NTN.

Code to find the actual strings used, define a named range, and import into Access:



Dim xlApp As Excel.Application     
Dim xlWkb As Excel.Workbook       
Dim xlWS As Excel.Worksheet 
Dim lngLastRow as Long
Dim myImportRange as Range
dim strRangeName as String
set xlApp = New Excel.Application
xlApp.Visible=False 'make it go faster
set xlWB = xlApp.Workbooks.Open("PATH")
set xlWS = xlWB.Sheets("sts")
lngLastRow=xlWS.Range("A" & xlWS.Rows.Count).End(xlUp).Row
Set myImportRange = xlWS.Range("A1:G" & lnglastrow)
strRangeName="myData_2014MMDD"  'or any name that makes sense to you
myImportRange.Name=strRangeName
xlWB.Save
DoCmd.TransferSpreadsheet acImport, acSpreadsheetTypeExcel12, <Dest Table>, xlWb.FullName, True, strRangeName
xlApp.DisplayAlerts=False 'suppress save changes prompts
xlWB.Close False 

      

0


source


Consider a third approach, which queries books directly in an add SQL query:



With rs
   .MoveLast
   .MoveFirst

   While (Not .EOF)                
      importfile = rs.Fields("Path")
      Debug.Print importfile

      sql = "INSERT INTO sts " _
          & " SELECT * FROM [Excel 12.0 Xml;HDR = Yes;Database=" & importfile & "].[Sts$A:G]"

       CurrentDb.Execute sql, dbFailOnError

      .MoveNext
   Wend        

   .Close    
End With

      

0


source







All Articles