Selenium PhantomJS webdriver fail to grab ajax content

I am trying to clear a page loading most of its content via ajax.

I am trying to grab all nodes li

with an attribute data-section

from this web page . The html response has six required nodes that I need, but most of the rest are loaded via an ajax request that returns an html containing the remaining li

nodes.

So I switched from using requests to using selenium with the PhantomJS driver, which should be xhr friendly, but I am not getting the extra ajax content loaded.

Runnable:

from selenium import webdriver
from lxml import html

br = webdriver.PhantomJS()
br.get(url)
tree = html.fromstring(br.page_source)
print tree.xpath('//li[@data-section]/a/text()')

      

In short, the above code cannot get the html injected into the webpage via xhr. How can i do this? If not, what are my other headless options.

+3


source to share


1 answer


The linked page prominently displays a loader ( .archive_loading_bar

), which disappears as soon as the data is loaded. You can use an explicit expectation with an expected condition invisibility_of_element_located

.

from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver
from lxml import html

driver = webdriver.PhantomJS()
driver.get(url)
wait = WebDriverWait(driver, 10)
wait.until(EC.invisibility_of_element_located((By.CSS_SELECTOR, '.archive_loading_bar')))
tree = html.fromstring(driver.page_source)

      



This is adapted from this answer and waits up to 10 seconds or until the data is loaded.

+8


source







All Articles