Accessing Chrome DOM tree with python

Question

Accessing Chrome DOM tree with python

Using Chrome DevTools, you can see the DOM tree of the page. Is there a way to access this tree using python?

+1

root 21 Sep 12 at 13:44

2 answers

Have you used the BeautifulSoup library? This section of the tutorial may answer your question. http://www.crummy.com/software/BeautifulSoup/bs3/documentation.html#The Processing Tree

Then you will also need to import the query library.

from BeautifulSoup import BeautifulSoup
import requests
url = 'http://www.crummy.com/software/BeautifulSoup/bs3/documentation.html'
page = requests.get(url)
soup = BeautifulSoup(page.content)
print soup

+1

msunbot 21 Sep 12 at 15:25

source to share

root · Accepted Answer · 2012-09-21T15:35:05+0000

The best way I've found is to use selenium.webdriver

:

import selenium.webdriver as webdriver
import lxml.html as lh
import lxml.html.clean as clean

browser = webdriver.Chrome() # Get local session of Chrome
browser.get("http://www.webpage.com") # Load page

content=browser.page_source
cleaner=clean.Cleaner()
content=cleaner.clean_html(content) 
doc=lh.fromstring(content)

doc gets DOM like lxml.html.HtmlElement

Accessing Chrome DOM tree with python

More articles: