Python, Scrapy, Selenium: how to connect webdriver to the "response" passed to the function to use it for further actions
I'm trying to use Selenium to get the value of a selected option from a dropdown in a scrapy spider, but I'm not sure how. His first interaction with Selene.
As you can see in the code below, I am creating a request in a function parse
that calls the function parse_page
as a callback. Q. parse_page
I want to extract the value of the selected parameter. I cannot figure out how to connect the webdriver to the response page posted to parse_page so that I can use it in Select. I wrote the clearly wrong code below :(
from scrapy.spider import Spider
from scrapy.selector import Selector
from scrapy.http import Request
from scrapy.exceptions import CloseSpider
import logging
import scrapy
from scrapy.utils.response import open_in_browser
from scrapy.http import FormRequest
from scrapy.http import Request
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from activityadvisor.items import TruYog
logging.basicConfig()
logger = logging.getLogger()
class TrueYoga(Spider):
name = "trueyoga"
allowed_domains = ["trueyoga.com.sg","trueclassbooking.com.sg"]
start_urls = [
"http://trueclassbooking.com.sg/frames/class-schedules.aspx",
]
def parse(self, response):
clubs=[]
clubs = Selector(response).xpath('//div[@class="club-selections"]/div/div/div/a/@rel').extract()
clubs.sort()
print 'length of clubs = ' , len(clubs), '1st content of clubs = ', clubs
req=[]
for club in clubs:
payload = {'ctl00$cphContents$ddlClub':club}
req.append(FormRequest.from_response(response,formdata = payload, dont_click=True, callback = self.parse_page))
for request in req:
yield request
def parse_page(self, response):
driver = webdriver.Firefox()
driver.get(response)
clubSelect = Select(driver.find_element_by_id("ctl00_cphContents_ddlClub"))
option = clubSelect.first_selected_option
print option.text
Is there a way to get this parameter value in scrapy without using Selenium? My google and stackoverflow searches have not provided any helpful answers yet.
Thanks for the help!
If you get a response, there are select fields with their parameters. One of these parameters has an attribute selected="selected"
. I think you should pass this attribute to avoid using Selenium:
def parse_page(self, response):
response.xpath("//select[@id='ctl00_cphContents_ddlClub']//option[@selected = 'selected']").extract()
I would recommend using Downloader Middleware to pass Selenium response to your spider method parse
. Take a look at the example I wrote as an answer to another question .