Python, Scrapy, Selenium: how to connect webdriver to the "response" passed to the function to use it for further actions
I'm trying to use Selenium to get the value of a selected option from a dropdown in a scrapy spider, but I'm not sure how. His first interaction with Selene.
As you can see in the code below, I am creating a request in a function parse
that calls the function parse_page
as a callback. Q. parse_page
I want to extract the value of the selected parameter. I cannot figure out how to connect the webdriver to the response page posted to parse_page so that I can use it in Select. I wrote the clearly wrong code below :(
from scrapy.spider import Spider
from scrapy.selector import Selector
from scrapy.http import Request
from scrapy.exceptions import CloseSpider
import logging
import scrapy
from scrapy.utils.response import open_in_browser
from scrapy.http import FormRequest
from scrapy.http import Request
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from activityadvisor.items import TruYog
logging.basicConfig()
logger = logging.getLogger()
class TrueYoga(Spider):
name = "trueyoga"
allowed_domains = ["trueyoga.com.sg","trueclassbooking.com.sg"]
start_urls = [
"http://trueclassbooking.com.sg/frames/class-schedules.aspx",
]
def parse(self, response):
clubs=[]
clubs = Selector(response).xpath('//div[@class="club-selections"]/div/div/div/a/@rel').extract()
clubs.sort()
print 'length of clubs = ' , len(clubs), '1st content of clubs = ', clubs
req=[]
for club in clubs:
payload = {'ctl00$cphContents$ddlClub':club}
req.append(FormRequest.from_response(response,formdata = payload, dont_click=True, callback = self.parse_page))
for request in req:
yield request
def parse_page(self, response):
driver = webdriver.Firefox()
driver.get(response)
clubSelect = Select(driver.find_element_by_id("ctl00_cphContents_ddlClub"))
option = clubSelect.first_selected_option
print option.text
Is there a way to get this parameter value in scrapy without using Selenium? My google and stackoverflow searches have not provided any helpful answers yet.
Thanks for the help!
source to share
If you get a response, there are select fields with their parameters. One of these parameters has an attribute selected="selected"
. I think you should pass this attribute to avoid using Selenium:
def parse_page(self, response):
response.xpath("//select[@id='ctl00_cphContents_ddlClub']//option[@selected = 'selected']").extract()
source to share
I would recommend using Downloader Middleware to pass Selenium response to your spider method parse
. Take a look at the example I wrote as an answer to another question .
source to share