Open_uri / Nokogiri redirection problems

I'm using Nokogiri to clean up a webpage, which works great unless the page has a redirect loop.

So when I clear this site: https://www.cardcomplete.com/besuchen-isie-uns-auf-facebook/

I am getting this error

/home/balint/.rvm/rubies/ruby-2.2.1/lib/ruby/2.2.0/open-uri.rb:224:in open_loop': redirection forbidden: https://www.cardcomplete.com/besuchen-isie-uns-auf-facebook/ -> http://www.facebook.com/cardcomplete (RuntimeError)

      

But when I try to clean this site I get the same error, but now it redirects to the https version of the facebook page:

/home/balint/.rvm/rubies/ruby-2.2.1/lib/ruby/2.2.0/open-uri.rb:224:in `open_loop': redirection forbidden: http://www.facebook.com/cardcomplete -> https://www.facebook.com/cardcomplete (RuntimeError)

      

Of course scraping the https version of the facebook page works.

I have set this open_uri_redirections , which works for the facebook http-> https redirect, but not for the first link:

doc = Nokogiri::HTML(open('https://www.cardcomplete.com/besuchen-isie-uns-auf-facebook/', :allow_redirections => :safe))

      

How to solve this?

+3


source to share





All Articles