Can't grab HAR with Python Selenium Script with BrowserMob-Proxy

Purpose: I want to run a Selenium Python script through BrowserMob-Proxy that will capture and output the capture of the HAR file.

Problem: I have a functional (very simple) Python script (shown below). When changed to use BrowserMob-Proxy for HAR capture, it doesn't work. Below I provide two different scenarios that both fail, but for different reasons (see the details after the code snippets).

BrowserMob-Proxy Explanation: As mentioned earlier, I am using both 0.6.0 and 2.0-beta-8. The reason for this is that A) LightBody (BMP lead developer) recently pointed out that its most recent version (2.0-beta-9) is not working and advises users to use 2.0-beta-8 and B) from what I can tell after reading information on various sites / stackoverflow, it is that 0.6.0 (obtained via PIP) is used to make calls to Client.py/Server.py, whereas 2.0-beta-8 is used to initiate the server. To be honest, it confuses me. However, importing a BMP server requires a batch (.bat) file to start the server, which is not listed in 0.6.0 but has 2.0-beta-8 ... if anyone can shed some light on this area (I suspect that this is the root of my problems described below), then I would be very grateful.

Software Specifications:

  • Operating System: Windows 7 (64x) - Works in VirtualBox
  • Browser: FireFox (32.0.2)
  • Script Language: Python (2.7.8)
  • Automatic web browser: Selenium (2.43.0) - installed via PIP
  • BrowserMob-Proxy: 0.6.0 and 2.0-beta-8 - see explanation below

Selenium script (this script works):

"""This script utilizes Selenium to obtain the Google homepage"""
from selenium import webdriver

driver = webdriver.Firefox()       # Opens FireFox browser.
driver.get('https://google.com/')  # Gets google.com and loads page in browser.

driver.quit()                      # Closes Firefox browser

      

This script succeeds in running and throws no errors. For illustrative purposes, it indicates that it works prior to adding BMP logic.

Script ALPHA with BMP (doesn't work):

"""Using the same functional Selenium script, produce ALPHA_HAR.har output"""
from browsermobproxy import Server
server = Server('C:\Users\Matt\Desktop\\browsermob-proxy-2.0-beta-8\\bin\\browsermob-proxy')
server.start()
proxy = server.create_proxy()

from selenium import webdriver
driver = webdriver.Firefox()           # Opens FireFox browser.

proxy.new_har("ALPHA_HAR")             # Creates a new HAR
driver.get("https://www.google.com/")  # Gets google.com and loads page in browser.
proxy.har                              # Returns a HAR JSON blob

server.stop()

      

This code will execute the script successfully and will not result in errors. However, when searching my entire hard drive, I can never find ALPHA_HAR.har.

Script BETA with BMP (doesn't work):

"""Using the same functional Selenium script, produce BETA_HAR.har output"""
from browsermobproxy import Server
server = Server("C:\Users\Matt\Desktop\\browsermob-proxy-2.0-beta-8\\bin\\browsermob-proxy")
server.start()    
proxy = server.create_proxy()

from selenium import webdriver
profile = webdriver.FirefoxProfile()
profile.set_proxy(proxy.selenium_proxy())
driver = webdriver.Firefox(firefox_profile=profile)

proxy.new_har("BETA_HAR")             # Creates a new HAR
driver.get("https://www.google.com/") # Gets google.com and loads page in browser.
proxy.har                             # Returns a HAR JSON blob

server.stop()

      

This code was taken from http://browsermob-proxy-py.readthedocs.org/en/latest/ . When you run the above code, FireFox will try to get google.com, but it never fails to load the page. Eventually, it will time out without any error. And BETA_HAR.har cannot be found anywhere on my hard drive. I also noticed that when trying to use this browser to visit any other site, it also fails to load (I suspect this is due to a wrong proxy setting).

+3


source to share


3 answers


I am using phantomJS, here is an example on how to use it with python:



import browsermobproxy as mob
import json
from selenium import webdriver
BROWSERMOB_PROXY_PATH = '/usr/share/browsermob/bin/browsermob-proxy'
url = 'http://google.com'

s = mob.Server(BROWSERMOB_PROXY_PATH)
s.start()
proxy = s.create_proxy()
proxy_address = "--proxy=127.0.0.1:%s" % proxy.port
service_args = [ proxy_address, '--ignore-ssl-errors=yes', ] #so that i can do https connections
driver = webdriver.PhantomJS(service_args=service_args)
driver.set_window_size(1400, 1050)
proxy.new_har(url)
driver.get(url)
har_data = json.dumps(proxy.har, indent=4)
screenshot = driver.get_screenshot_as_png()
imgname = "google.png"
harname = "google.har"
save_img = open(imgname, 'a')
save_img.write(screenshot)
save_img.close()
save_har = open(harname, 'a')
save_har.write(har_data)
save_har.close()
driver.quit()
s.stop()

      

+2


source


Try the following:



from browsermobproxy import Server
from selenium import webdriver
import json

server = Server("path/to/browsermob-proxy")
server.start()
proxy = server.create_proxy()
profile = webdriver.FirefoxProfile()
profile.set_proxy(self.proxy.selenium_proxy())
driver = webdriver.Firefox(firefox_profile=profile)
proxy.new_har("http://stackoverflow.com", options={'captureHeaders': True})
driver.get("http://stackoverflow.com")    
result = json.dumps(proxy.har, ensure_ascii=False)
print result
proxy.stop()    
driver.quit()

      

+2


source


When you do:

proxy.har

      

You need to parse this response, proxy.har is a JSON object, so if you need to generate a file, you need to do this:

myFile = open('BETA_HAR.har','w')
myFile.write( str(proxy.har) )
myFile.close()

      

Then you will find your .har

0


source







All Articles