How to control genuine pages

I am trying to control the genuine hits of a page. This is what my site does. I have an article directory where people can post articles. When their article is published, they are billed based on the number of unique users visiting their pages. Thus, the pages suffered. Here is the problem I am facing.

What I need:

  • I don't want to be tracked by small search engines or robots on page images.
  • I would like the main 4 search engines to view my site because I can monitor them by IP address and not count their visits as a page visit. This cannot be done for spambots because they serve well as a real human or large search engine.

Problems:

  • There are spam bots on the internet that do not respect the robot.txt file
  • There are bots that try to fake themselves as a real person. Managing user agent and other things in the header.
  • Performance can suffer by always checking the database for good IP addresses.
  • A human can bypass captha just to let the robot view my pages.

Possible solutions:

  • Require captcha on every page. If captcha passes. then register the IP as good or send a cookie to the user's computer indicating they passed.
  • Allow all major search engine IPs, so they won't be submitted using captcha.
  • Purchasing bot detection software
  • Require the viewer to pitch a trick every 7 days.

Getting accurate human pageviews is critical for this site to function properly. Do you have other ideas.

+2


source to share


2 answers


You can just leave it to Google Analytics . This is a very good job solving the problem you are trying to solve, and it's free.



+1


source


Do you have a reason not to use an existing service or solution?



If you just want to track page images, set up Google Analytics or a similar service on your site and they'll do better noise filtering than a manual solution is possible.

+1


source







All Articles