Selenium PhantomJS webdriver fail to grab ajax content

Question

Selenium PhantomJS webdriver fail to grab ajax content

I am trying to clear a page loading most of its content via ajax.

I am trying to grab all nodes li

with an attribute data-section

from this web page . The html response has six required nodes that I need, but most of the rest are loaded via an ajax request that returns an html containing the remaining li

nodes.

So I switched from using requests to using selenium with the PhantomJS driver, which should be xhr friendly, but I am not getting the extra ajax content loaded.

Runnable:

from selenium import webdriver
from lxml import html

br = webdriver.PhantomJS()
br.get(url)
tree = html.fromstring(br.page_source)
print tree.xpath('//li[@data-section]/a/text()')

In short, the above code cannot get the html injected into the webpage via xhr. How can i do this? If not, what are my other headless options.

+3

python ajax selenium phantomjs selenium-webdriver

pad 15 nov. 14 at 17:26

source to share

1 answer

Artjom B. · Accepted Answer · 2014-11-15T18:08:38+0000

The linked page prominently displays a loader ( .archive_loading_bar

), which disappears as soon as the data is loaded. You can use an explicit expectation with an expected condition invisibility_of_element_located

.

from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver
from lxml import html

driver = webdriver.PhantomJS()
driver.get(url)
wait = WebDriverWait(driver, 10)
wait.until(EC.invisibility_of_element_located((By.CSS_SELECTOR, '.archive_loading_bar')))
tree = html.fromstring(driver.page_source)

This is adapted from this answer and waits up to 10 seconds or until the data is loaded.

Selenium PhantomJS webdriver fail to grab ajax content

More articles: