Proxies with Scrapy-Splash

I am trying to get proxies to work on my local splash instance. I've read several docs but haven't found any valid examples. It has been brought to my attention that the reason for this is https://github.com/scrapy-plugins/scrapy-splash/issues/107 . I don't get this trace anymore, but still can't use Splash with a proxy. New error message below. Thanks in advance if anyone can help me solve this problem. None of my requests even made it to Splash.

  def parse_json(self, response):
    json_data = response.body
    load = json.loads(json_data.decode('utf-8'))
    dump = json.dumps(load,sort_keys=True,indent=2)
    LUA_SOURCE = """
    function main(splash)
        local host = "proxy.crawlera.com"
        local port = 8010
        local user = "APIKEY"
        local password = ""
        local session_header = "X-Crawlera-Session"
        local session_id = "create"

        splash:on_request(function (request)
            request:set_header("X-Crawlera-UA", "desktop")
            request:set_header(session_header, session_id)
            request:set_proxy{host, port, username=user, password=password}
        end)

        splash:on_response_headers(function (response)
            if response.headers[session_header] ~= nil then
                session_id = response.headers[session_header]
            end
        end)

        splash:go(splash.args.url)
        return splash:html()
    end
    """
    for link in load['d']['blogtopics']:
        link = link['Uri']
        yield SplashRequest(link, self.parse_blog, endpoint='execute',  args={'wait': 3, 'lua_source': LUA_SOURCE})


2017-03-29 09:26:37 [scrapy.core.engine] DEBUG: Crawled (503) <GET http://community.martindale.com/legal-blogs/Practice_Areas/b/corporate__securities_law/archive/2011/08/11/sec-adopts-new-rules-replacing-credit-ratings-as-a-criterion-for-the-use-of-short-form-shelf-registration.aspx via http://localhost:8050/execute> (referer: None)

      

+3


source to share


1 answer


The problem comes from the Crawlera middleware. There is no processing for SplashRequest. It tries to go through the proxy to the local host.



+2


source







All Articles