Why does OpenURI return 404 when the parsed url works fine in the browser?
I am trying to escape a url that contains special characters like danish character 'ΓΈ'
.
Url:
url = "http://www.zara.com/dk/da/dame/tilbehΓΈr/tilbehΓΈr/stribet-hue-c271008p2195502.html"
To make OpenURI recognize it as a valid url, I:
url = Addressable::URI.parse(url).normalize.to_s
and parse it with
doc = Nokogiri::HTML(open(url))
which returns:
OpenURI::HTTPError: 404 Not Found
I don't know why OpenURI is returning 404 because the normalized url works fine in the browser.
Why is this and what do I need to do to fix it?
+3
source to share
1 answer
I found out that the problem was with the url server that I was trying to parse. They rejected the default User-Agent used by OpenURI.
From the documentation in OpenURI it says that additional header fields can be specified using an optional hash argument:
open("http://www.ruby-lang.org/en/",
"User-Agent" => "Ruby/#{RUBY_VERSION}",
"From" => "foo@bar.invalid",
"Referer" => "http://www.ruby-lang.org/") {|f|
# ...
}
I just used a different User-Agent and everything worked fine.
+5
source to share