Excel VBA - load blog text without loading images

I am trying to get a blog post (text only) using the below code:

Function extractPostBody(myURL As String) As String
Dim IE As New InternetExplorer
IE.Visible = True

IE.navigate myURL

On Error GoTo 0

Do
DoEvents
Loop Until IE.readyState = READYSTATE_COMPLETE

Dim Doc As HTMLDocument
Set Doc = IE.Document

For i = 0 To Doc.getElementsByTagName("p").Length - 1
    If InStr(1, Doc.getElementsByTagName("p")(i).innerText, "Tags: ") > 0 Then
        Exit For
    End If
    PostBody = PostBody & vbNewLine & Doc.getElementsByTagName("p")(i).innerText

Next i

IE.Quit
extractPostBody = PostBody

End Function

      

After getting the text, I assign it to a cell and then use the split function to count the number of words in the extracted text. However, the code works on websites with a lot of images, the code waits until those images are loaded, which will slow down execution dramatically.

Is there another way to disconnect the text from the blog without waiting for the images to load?

EDIT:

Using Jeeped's suggestion, I am using the below code which I took from another StackOverflow post, however, it doesn't seem to come back to it to give credit to the author:

Function ScrapeWebPage(ByVal URL As String)
    Dim HTMLDoc As New HTMLDocument
    Dim tmpDoc As New HTMLDocument
    Dim PostBody As String

    Dim i As Integer, row As Integer
    Dim ws As Worksheet
    Set ws = ThisWorkbook.Sheets("Sheet1")

    Set XMLHttpRequest = CreateObject("MSXML2.XMLHTTP")
    XMLHttpRequest.Open "GET", URL, False
    XMLHttpRequest.send

    While XMLHttpRequest.readyState <> 4
        DoEvents
    Wend

    With HTMLDoc.body
        'Set HTML Document
        .innerHTML = XMLHttpRequest.responseText

        Set ListItems = .getElementsByTagName("p")

        'Let process each data of the list items
        For Each li In ListItems
            PostBody = PostBody & vbNewLine & li.innerText
        Next
    End With

    ScrapeWebPage = PostBody
End Function

      

This works, however the code now returns a captcha message which obviously I cannot fill in anymore because I cannot render IE. or can I?

+3


source to share





All Articles