How to find the number of image results for a given search query using VBA

I've already fiddled with HTML from Excel trying to get close to how normal images have different resolutions. I'm hoping to get something dynamic - the user enters a search term and the code goes through a set of predefined image resolutions, evaluating how normal images are for that search term between the specified resolutions.

Step one, though, is to get a reliable (and fast) way to return the number of images at a specific resolution. I wrote this code:

Sub GoogleWithURL() 'requires Microsoft HTML Object Library

    Dim url As String, searchTerm As String
    Dim objIE As InternetExplorer 'special object variable representing the IE browser
    Dim ws As Worksheet
    Set ws = ThisWorkbook.Sheets("sheet1")
    Dim currPage As HTMLDocument
    Dim xRes As Integer, yRes As Integer
    With ws
        xRes = .Range("XRes")
        yRes = .Range("YRes")
        searchTerm = .Range("search")
    End With

    'create URL to page with these image criteria
    url = WorksheetFunction.Concat("https://www.google.com/search?q=", searchTerm, _
                        "&tbm=isch&source=lnt&tbs=isz:ex,iszw:", xRes, "iszh:", yRes)

    'initiating a new instance of Internet Explorer and asigning it to objIE
    Set objIE = New InternetExplorer
    'objIE.Visible = True 'for debugging purposes

    'Google images search
    objIE.navigate url
    Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
    Set currPage = objIE.document

    'Count image results
    Set valueResult = currPage.getElementById("rg_s").getElementsByTagName("IMG")
    MsgBox WorksheetFunction.Concat("'", searchTerm, "' returns ", valueResult.Length _
    , " images @ ", xRes, "x", yRes, "px.") 'returns number of loaded images on page

    'close the browser
    On Error Resume Next 'required when the browser is visible and I close it manually half way
    objIE.Quit

End Sub

      

It moves the internet explorer object to a specific google image search with resolution, counts the number of images in the id rg_s

(these are image results, as opposed to banner images, etc.). It then returns this score as a message box. (When I eventually implement this, I will get my values ​​back in the column on the sheet, going through 30 different resolutions)

Problems

The main problems with this code:

  • This does not provide a very useful score. The resolution is low because it only counts uploaded images - which means that most searches at normal resolutions like 1920x1080 or 1366x768 will return 100 images.

  • It's slow. For me, navigating through the pages, counting image tags, it is all similar to use .Select

    in VBA. It is like a manual approach that a human would do and is therefore ineffective.

Decision

I can think of some approaches to solve these problems

  • Data resolution / getting more useful invoice

    • Scroll down. If I can upload more images, chances are I can differentiate a little better. I found that scrolling as far as I can (to the Load More Results button) yields a 400 cap not a 100 - if there are at least that many images for a given resolution, then I'm happy and I'll give it top rank. However, this does not help with problem 2. However , how would I do it?

    • Narrow results. If 100 is returned, I can change filetype:

      in the submitted url, for example add filetype:png

      to possibly halve the number of images returned, giving me a better spread in the 0-100 range. Not ideal, though, as I would, for repeating multiple file types for some permissions, slowing down the code and even then, not necessarily giving me what I want.

    • To do this, use your own values ​​from Google (or other search engines). I have asked this on different sites and in different forms, is there any data on the number of images available directly from Google i.e. No backtracking (and slow loading) of the images themselves. Like a about 1,300,500 results in 0.03 seconds

      normal search, just for images? If I could use a pre-computed value every time the samples are larger than 100, I could get a more detailed image.

  • Slowness

    • Try another HTTP request. Right now I'm opening an instance of Internet Explorer and navigating to the page. This sounds very human, I'd rather request a computer style. I mean, instead of using my laptop to trawl the images one by one, I get google supercomputers to do legwork, only asking for a score, like against the images themselves. I don’t know how to do it. I know two more ways to search the web from Excel; web request and CreateObject("MSXML2.serverXMLHTTP")

      . I don't know any of them, but if you think they will be better, then I will carefully study them.

Summary

Hopefully that's a lot, and I think my train of thought should be clear enough. Actual answers on how to scroll down / load more images / get google to return a score and not the images themselves would be better, tips on what to do would be helpful too.

0


source to share


2 answers


Your bottleneck is not in the for loop. It opens the browser and directs it to a specific location. If you are worried about time, you should grab the browser that is already open for this page and not close it until you have run all your searches. You must save at least 2 seconds to search. I ran the following code and got this:

Time to open and install Explorer: 2.41 seconds.

Counting time for 100 photos (1): 0.1 seconds.

Counting time for 100 photos (2): 0.11 seconds.

The difference between our approaches is 1/100 of a second.



Also, Google Images requires the user to lay out the page to trigger the next 100 images. If you can find an ajax or javascript operator to make this happen, you should be able to make it think it has pages. This is why you only get 100 images.

Or you can open a browser, enter a search term, and scroll down until 299 images appear on the screen when you find a button that says Show More Images. And then hook this open web page.

If you use multiple search terms than your bottleneck in opening and closing browsers other than images.

Sub GoogleWithURL() 'requires Microsoft HTML Object Library
' https://www.google.com/search?q=St+Mary&source=lnms&tbm=isch&sa=X&ved=0ahUKEwj99ay14aPSAhWDMSYKHadiCjkQ_AUICSgC&biw=1600&bih=840
    Dim url As String
    Dim objIE As InternetExplorer 'special object variable representing the IE browser
    Dim ws As Worksheet
    Set ws = ThisWorkbook.Sheets("Sheet1")
    Dim currPage As HTMLDocument
    Dim StartTime As Double, SecondsElapsed As Double

    '****************************************
    '   Hard code url to search images of St Mary
    url = "https://www.google.com/search?q=St+Mary&source=lnms&tbm=" & _
            "isch&sa=X&ved=0ahUKEwj99ay14aPSAhWDMSYKHadiCjkQ_AUICSgC&biw=1600&bih=840"

    StartTime = Timer
    Set objIE = New InternetExplorer
    objIE.Visible = True
    objIE.navigate url
    Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
    Set currPage = objIE.document
    SecondsElapsed = Round(Timer - StartTime, 2)
    Debug.Print "Time to open and set Explorer:  " & SecondsElapsed & " seconds."


    StartTime = Timer
    Set valueResult = currPage.getElementById("rg_s").getElementsByTagName("IMG")
    For Each pic In valueResult
        counter = counter + 1
    Next pic
    SecondsElapsed = Round(Timer - StartTime, 2)

    Debug.Print "Time to Count " & counter & " Photos(1):  " & SecondsElapsed & " seconds."

    counter = 0
    StartTime = Timer
    Set valueResult = currPage.getElementsByTagName("IMG")
    For Each pic In valueResult
        If InStr(1, pic.className, "rg") > 0 Then
            counter = counter + 1
        End If
    Next pic
    SecondsElapsed = Round(Timer - StartTime, 2)

    Debug.Print "Time to Count " & counter & " Photos(2):  " & SecondsElapsed & " seconds."

    On Error Resume Next 'required when the browser is visible and I close it manually half way
    objIE.Quit

End Sub

      

+1


source


After a few more questions, and now feeling somewhat wiser, I made a UDF for this:

Public Function GOOGLE_COUNT(searchTerm As String, xRes As Long, yRes As Long, Optional timeout As Long = 10) As Long

    Dim url As String
    Dim objIE As InternetExplorer
    Dim currPage As HTMLDocument
    Dim stTimer As Double, tElapsed As Single
    Dim valueResult As IHTMLElementCollection

    'create URL to page with these image criteria
    url = "https://www.google.com/search?q=" & searchTerm & _
                        "&tbm=isch&source=lnt&tbs=isz:ex,iszw:" & xRes & ",iszh:" & yRes

    'initiating a new instance of Internet Explorer and asigning it to objIE
    Set objIE = New InternetExplorer

    'Google images search
    objIE.navigate url
    Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
    Set currPage = objIE.document
    Dim myDiv As HTMLDivElement: Set myDiv = currPage.getElementById("fbar")
    Dim elemRect As IHTMLRect: Set elemRect = myDiv.getBoundingClientRect
    stTimer = Timer
    'Scroll until bottom of page is in view
    Do Until elemRect.bottom > 0 Or tElapsed > timeout 'timeout after n seconds
        currPage.parentWindow.scrollBy 0, 10000
        Set elemRect = myDiv.getBoundingClientRect
        tElapsed = Timer - stTimer
    Loop
    myDiv.ScrollIntoView
    'Count the images
    Set valueResult = currPage.getElementById("rg_s").getElementsByTagName("IMG")
    GOOGLE_COUNT = valueResult.Length
    objIE.Quit

End Function

      

Works like this: to search for "Saint Mary" at 1366: 768 image size, then

=GOOGLE_COUNT("St. Mary", 1366, 768)

      

Or with a 10 second timeout (search stops scrolling if 10 seconds have passed and just counts uploaded images)

=GOOGLE_COUNT("St. Mary", 1366, 768, 10)

      



I explain in another question how scrolling works, now it's dirty but functional.

Important:

As @John Muggins points out, it takes a significant amount of time to download and not count. In particular, opening and closing InternetExplorer

. Therefore, to avoid huge recalculations; if (like me) you want to check for more than one term / permission, put that code in a macro , not a function (comment if you think I should post this). This UDF is for one search only

Hope this is helpful, I thought I should revisit the question to post the answer I got.

Final note:

  • Your computer is (probably) not crashing, the function is just being evaluated.

  • In the search box, enter whatever you have specified in the google search bar - eg. "Jaguar -car" returns images of an animal, not a car company

  • The result is the number 0-400; 0-399 is the actual number of images counted (as long as you set the waiting time long enough - auto - 10 s). 400 is the maximum, so there can be over 400 images for this term.

0


source







All Articles