Python, Scrapy, Selenium: how to connect webdriver to the "response" passed to the function to use it for further actions

I'm trying to use Selenium to get the value of a selected option from a dropdown in a scrapy spider, but I'm not sure how. His first interaction with Selene.

As you can see in the code below, I am creating a request in a function parse

that calls the function parse_page

as a callback. Q. parse_page

I want to extract the value of the selected parameter. I cannot figure out how to connect the webdriver to the response page posted to parse_page so that I can use it in Select. I wrote the clearly wrong code below :(

from scrapy.spider import Spider
from scrapy.selector import Selector
from scrapy.http import Request
from scrapy.exceptions import CloseSpider
import logging
import scrapy
from scrapy.utils.response import open_in_browser
from scrapy.http import FormRequest
from scrapy.http import Request
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from activityadvisor.items import TruYog

logging.basicConfig()
logger = logging.getLogger()

class TrueYoga(Spider):
    name = "trueyoga"
    allowed_domains = ["trueyoga.com.sg","trueclassbooking.com.sg"]
    start_urls = [
        "http://trueclassbooking.com.sg/frames/class-schedules.aspx",
    ]

    def parse(self, response):

        clubs=[]
        clubs = Selector(response).xpath('//div[@class="club-selections"]/div/div/div/a/@rel').extract()
        clubs.sort()
        print 'length of clubs = ' , len(clubs), '1st content of clubs = ', clubs
        req=[]
        for club in clubs:
            payload = {'ctl00$cphContents$ddlClub':club}
            req.append(FormRequest.from_response(response,formdata = payload, dont_click=True, callback = self.parse_page))
        for request in req:
            yield request

    def parse_page(self, response):
        driver = webdriver.Firefox()
        driver.get(response)
        clubSelect = Select(driver.find_element_by_id("ctl00_cphContents_ddlClub"))
        option = clubSelect.first_selected_option
        print option.text

      

Is there a way to get this parameter value in scrapy without using Selenium? My google and stackoverflow searches have not provided any helpful answers yet.

Thanks for the help!

+3


source to share


2 answers


If you get a response, there are select fields with their parameters. One of these parameters has an attribute selected="selected"

. I think you should pass this attribute to avoid using Selenium:



def parse_page(self, response):
    response.xpath("//select[@id='ctl00_cphContents_ddlClub']//option[@selected = 'selected']").extract()

      

+1


source


I would recommend using Downloader Middleware to pass Selenium response to your spider method parse

. Take a look at the example I wrote as an answer to another question .



+1


source







All Articles