How do you determine if a user is not a bot accessing your site?

I know that user agents are one metric, but that is easy to fake. What other reliable indicators are there that the visitor is really a bot? Inconsistent headlines? Are images / javascript required? Thank!

+2


source to share


6 answers


To achieve this, CVSTrac uses a honeypot . This is a page linked somewhere in the place where the crawlers get to, but people usually ignore it. CVSTrac goes one step further by allowing the user to prove they are human.



+4


source


"Are images / javascript required?" I would go for this, however Google and others are requesting images and javascript files currently.



How about the speed of requesting time? Bots read your content much faster than humans.

+3


source


There are 4 things we are looking for:

  • User agent string. This is very easy to spoof, but often scanners will use their own unique user agent string.

  • Speed ​​of access to pages if they have access to more than every half of one and a half or so, which is usually a good indicator

  • If they only request HTML or request the whole page. Some crawlers will only ask for the HTML structure. This is usually a good hint.

  • Incoming URL

+2


source


Reverse captcha sorting can also help; you can create a text input field with display: none; there is a style attribute (or your style) in it. If it is sent, most likely you are dealing with a bot.

Edit: This was actually what was aggregated in my RSS reader, if I can find the source I'll give a good example.

+2


source


Take a look at Bad Behavior , a library that uses a wide variety of bot detection methods

+1


source


Isn't that what captcha is invented for?

0


source







All Articles