Selenium and Scrapy integration to click past page and then save cookies

Question

Selenium and Scrapy integration to click past page and then save cookies

I've been browsing stackoverflow for a couple of hours and still haven't been able to find a suitable answer for what I'm currently doing. I want to use Selenium to go through the start page to click it and then transfer cookies to Scrapy to then scan the database. So far, I keep redirecting to the first login page.

I relied on grabbing cookies and putting them in the request from this response authorization authorization with cookie

class HooversTest(scrapy.Spider):
    global starturls
    name = "hooversTest"
    allowed_domains = ["http://subscriber.hoovers.com"]
    login_page = ["http://subscriber.hoovers.com/H/home/index.html"]
    start_urls = ["http://subscriber.hoovers.com/H/company360/overview.html?companyId=99566395", 
              "http://subscriber.hoovers.com/H/company360/overview.html?companyId=10723000000000"]



def login(self, response):
    return Request(url=self.login_page,
        cookies=self.get_cookies(), callback=self.after_login)

def get_cookies(self):
    self.driver = webdriver.Firefox()
    self.driver.get("http://www.mergentonline.com/Hoovers/continue.php?status=sucess")
    elem = self.driver.find_element_by_name("Continue")
    elem.click()
    time.sleep(15)
    cookies = self.driver.get_cookies()
    #reduce(lambda r, d: r.update(d) or r, cookies, {})
    self.driver.close()
    return cookies

def parse(self, response):
    return Request(url="http://subscriber.hoovers.com/H/company360/overview.html?companyId=99566395",
        cookies=self.get_cookies(), callback=self.after_login)


def after_login(self, response):
    hxs = HtmlXPathSelector(response)
    print hxs.select('//title').extract()

+3

python cookies selenium web-scraping scrapy

Jay feng 28 oct. 14 at 2:06

source to share

No one has answered this question yet

See similar questions:

7

authorization authorization using cookie

or similar:

380

Cookie blocked / not saved in IFRAME in Internet Explorer

61

selenium with scrapy for dynamic page

17

Accessing session file in scrapy spiders

2

Filtered cookie not going through multiple callbacks

1

URLs in Scrapy crawler do not end up in following parser

1

Clearing a CSRF cookie is not accepted and results in a 302 redirect

1

CrawlSpider doesn't follow certain rules when used in script

1

Unable to login after resuming scanning. Cookies are not sticky after resuming treatment

0

Gradually crawl a website with scrapy

Selenium and Scrapy integration to click past page and then save cookies

More articles: