NodeJS + Request - Access Denied when requesting a website
I am trying to query the html of a website using a request, but I keep getting access denied. How do I get past this? Here is the code for the function below:
const request = require('request');
function firstShoe() {
request('https://www.jdsports.co.uk/product/green-nike-vapormax/281735/', function (error, response, body) {
console.log('body:', body);
});
}
Mistake:
</BODY>
</HTML>
body: <HTML><HEAD>
<TITLE>Access Denied</TITLE>
</HEAD><BODY>
<H1>Access Denied</H1>
You don't have permission to access "http://www.jdsports.co.uk/product/green-nike-vapormax/281735/" on this server.<P>
Reference #18.609d3e17.1500116386.15f0cb85
</BODY>
</HTML>
Found a solution by passing the user agent in the headers.
function firstShoe() {
var options = {
headers: {'user-agent': 'node.js'}
}
request('https://www.jdsports.co.uk/product/green-nike-vapormax/281735/', options, function (error, response, body) {
console.log(body);
message.channel.send(body);
});
}
+3
source to share
1 answer
You get 403 Forbidden
because this site blocks all requests made using non-generic user agents (they mostly check the header User-Agent
). This is a very simple protection to avoid the scraper.
For example, if you submit the following cURL with your default User-Agent, the response is great:
curl -v 'https://www.jdsports.co.uk/product/green-nike-vapormax/281735/'
However, if you repeat this request with a non-existent User-Agent, the request is blocked:
curl -v 'https://www.jdsports.co.uk/product/green-nike-vapormax/281735/' -H 'User-Agent: StackOverflow'
+3
source to share