Extract from multiple Excel files into one data table or file
I have over 100 .xlsx files. Each file has two sheets. The first sheet (always called sts) usually contains 15-20 thousand rows with a "Code" column. The second sheet (always called cps) contains about 85k lines, also with the same column of code.
I need to extract all rows for specific code from sts table, to table / sheet, and all rows for specific code from cps sheet to second table / sheet. I need to do this for all files.
I have experimented with two approaches
1) Use Excel VBA to open each file, use autofilter to copy the lines of code needed to the main workbook for matching. Using the following code, you will get the files from the predefined home directory and deploy Public Sub SearchFiles()
.
Public Sub SearchFiles() 'Macro to start the file extraction by drilling down from the mydir path specified Dim code As String Dim time1 As Double Dim time2 As Double Range("a1").Value = InputBox("Please type code to extract", code) time1 = Timer myFileSearch _ myDir:="C:\Data\Dashboard\2014\New Files Excel Loop", _ FileNameLike:="Reporting", _ FileTypeLike:=".xlsx", _ SearchSubFol:=True, _ myCounter:=0 time2 = Timer MsgBox time2 - time1 & "seconds" End Sub Private Sub myFileSearch(myDir As String, FileNameLike As String, FileTypeLike As String, _ SearchSubFol As Boolean, myCounter As Long) Dim fso As Object, myFolder As Object, myFile As Object Dim Rowcount As Long Dim rowcount2 As Long Dim masterbook As Workbook Set masterbook = ThisWorkbook Set fso = CreateObject("Scripting.FileSystemObject") Dim commodity As String code = Range("a1").Value Application.ScreenUpdating = False For Each myFile In fso.GetFolder(myDir).Files Workbooks.Open (myDir & "\" & myFile.Name) myCounter = myCounter + 1 ReDim Preserve myList(1 To myCounter) myList(myCounter) = myDir & "\" & myFile.Name ''loop to pull out all code rows in your directories into new file Workbooks(Workbooks.Count).Worksheets(1).Range("d2").Activate Rowcount = Workbooks(1).Sheets(1).Range("a1").CurrentRegion.Rows.Count + 1 Rows(1).AutoFilter Range("A1").AutoFilter Field:=3, Criteria1:=code, Operator:=xlAnd Range("A1").CurrentRegion.SpecialCells(xlCellTypeVisible).Copy _ Destination:=Workbooks(1).Sheets(1).Range("a" & Rowcount) 'filter out the code data Workbooks(Workbooks.Count).Worksheets(2).Activate Range("d2").Activate rowcount2 = Workbooks(1).Sheets(2).Range("a1").CurrentRegion.Rows.Count + 1 Rows(1).AutoFilter Range("A1").AutoFilter Field:=6, Criteria1:=code, Operator:=xlAnd Range("A1").CurrentRegion.SpecialCells(xlCellTypeVisible).Copy _ Destination:=Workbooks(1).Sheets(2).Range("a" & Rowcount) Workbooks(myFile.Name).Close savechanges:=False Next If SearchSubFol Then For Each myFolder In fso.GetFolder(myDir).SubFolders myFileSearch myDir & "\" & myFolder.Name, FileNameLike, FileTypeLike, True, myCounter Next End If End Sub
Each book takes 5-10 seconds to open and the whole process is very slow (and with errors at the moment).
2) Import everything into two Access tables and then clean up only the lines of code I need. This is slower than Excel's approach due to the number of rows.
Sub pulloop() DoCmd.RunSQL "delete * from sts" DoCmd.RunSQL "delete * from cps" strSql = "PathMap" Set rs = CurrentDb.OpenRecordset(strSql) With rs If Not .BOF And Not .EOF Then .MoveLast .MoveFirst While (Not .EOF) importfile = rs.Fields("Path") DoCmd.TransferSpreadsheet acimport, acSpreadsheetTypeExcel12, "Sts", importfile, True, "Sts!A:G" DoCmd.TransferSpreadsheet acimport, acSpreadsheetTypeExcel12, "CPs", importfile, True, "CPs!A:Q" 'Debug.Print rs.Fields("Path") .MoveNext Wend End If .Close End With End Sub
I adapted this to try and use AcLink, but I am struggling to implement it. Would it be possible to use aclink rather than acimport to query the required lines of code of each file when it enters Access, and if so, perhaps a faster way?
source to share
It looks like one of the problems with the second option that I am leaning towards is that you are importing ALL rows from an Excel file. Try using the Excel Object Model to define the named range on both sheets, and then use the docmd.transferspread table in a loop. You will need to change the ref column for a different sheet. NTN.
Code to find the actual strings used, define a named range, and import into Access:
Dim xlApp As Excel.Application
Dim xlWkb As Excel.Workbook
Dim xlWS As Excel.Worksheet
Dim lngLastRow as Long
Dim myImportRange as Range
dim strRangeName as String
set xlApp = New Excel.Application
xlApp.Visible=False 'make it go faster
set xlWB = xlApp.Workbooks.Open("PATH")
set xlWS = xlWB.Sheets("sts")
lngLastRow=xlWS.Range("A" & xlWS.Rows.Count).End(xlUp).Row
Set myImportRange = xlWS.Range("A1:G" & lnglastrow)
strRangeName="myData_2014MMDD" 'or any name that makes sense to you
myImportRange.Name=strRangeName
xlWB.Save
DoCmd.TransferSpreadsheet acImport, acSpreadsheetTypeExcel12, <Dest Table>, xlWb.FullName, True, strRangeName
xlApp.DisplayAlerts=False 'suppress save changes prompts
xlWB.Close False
source to share
Consider a third approach, which queries books directly in an add SQL query:
With rs .MoveLast .MoveFirst While (Not .EOF) importfile = rs.Fields("Path") Debug.Print importfile sql = "INSERT INTO sts " _ & " SELECT * FROM [Excel 12.0 Xml;HDR = Yes;Database=" & importfile & "].[Sts$A:G]" CurrentDb.Execute sql, dbFailOnError .MoveNext Wend .Close End With
source to share