Why does OpenURI return 404 when the parsed url works fine in the browser?

I am trying to escape a url that contains special characters like danish character 'ΓΈ'

.

Url:

url = "http://www.zara.com/dk/da/dame/tilbehΓΈr/tilbehΓΈr/stribet-hue-c271008p2195502.html"

      

To make OpenURI recognize it as a valid url, I:

url = Addressable::URI.parse(url).normalize.to_s

      

and parse it with

doc = Nokogiri::HTML(open(url))

      

which returns:

OpenURI::HTTPError: 404 Not Found

      

I don't know why OpenURI is returning 404 because the normalized url works fine in the browser.

Why is this and what do I need to do to fix it?

+3


source to share


1 answer


I found out that the problem was with the url server that I was trying to parse. They rejected the default User-Agent used by OpenURI.

From the documentation in OpenURI it says that additional header fields can be specified using an optional hash argument:



open("http://www.ruby-lang.org/en/",
  "User-Agent" => "Ruby/#{RUBY_VERSION}",
  "From" => "foo@bar.invalid",
  "Referer" => "http://www.ruby-lang.org/") {|f|
  # ...
}

      

I just used a different User-Agent and everything worked fine.

+5


source







All Articles