What is the best way to do a basic tracking view on a webpage?

I have a website accessible anonymously, blog posts and blogs and I would like to track the number of views received by each of the blog posts.

I want this to be as simple as possible, the accuracy should only be an approximation. This is not for analytics (we have Google for that) and I don't want to do any kind of log analysis to pull statistics as running in the background in this environment is tricky and I want the numbers to be as fresh as possible.

My current solution looks like this:

  • A web control that simply writes a view to a table for every GET.
  • Eliminates the list of known web crawlers using regex string and UserAgent
  • Provides exclusion of certain IP addresses (known spammers)
  • Provides blocking of some messages (when spammers come for this)

It actually looks like a pretty good job, but a few things annoy me. Spammers continued to find some messages, thus distorting perceptions. I still have to manually control the views, update the list of "bad" IP addresses.

Does anyone have any better suggestions for me? Does anyone know how views are tracked on StackOverflow questions?

+1


source to share


2 answers


It looks like your current solution is actually quite good.

We implemented one where the server code that delivered the content of the view also updated the database table in which the url was stored (this is actually a special identification code for the url, since the url can change over time) and the number views.

This was actually for a system with custom posts that others could comment on, but it equally applies to a situation where you are the only user creating posts (if I understand your description correctly).

We had to do the following to minimize (unfortunately not eliminate) the skew.



  • For registered users, each user can add only one viewpoint to a post. EVER. NO exceptions.
  • For anonymous users, each IP address can only add one browse item to a post each month. This was a little less reliable as from our point of view the IP addresses could be "split" (NAT, etc.). The reason we relaxed the "EVERYONE" requirement above was for this sharing reason.
  • The posts themselves were limited to adding one viewpoint per time period (the period started at a low level (say 10 seconds) and gradually increased (up to, say, 5 minutes), so new posts were allowed to gain views faster, due to their novelty) ... This took care of most of the spambots, as we found that they were prone to attack after posting.
  • Deleting a spam comment on a post or failing to bypass a CAPTCHA (see below) automatically blacklisted this IP and reduced the number of views for this post.
  • If the blacklist did not try to leave a comment for N days (customizable), it was removed from the blacklist. This rule and the previous rule minimized manual interference with the blacklist, and we had to monitor responses to spam content.
  • CAPTCHA. This solved a lot of spam issues, especially since we didn't just rely on things like OCR (like "what is this word →" optional "), we actually asked questions (like" what was 2 multiplied by half of 8? "), which break dumb character recognition bots They won't beat the crowds of cheap working CAPTCHA breakers (unless their math is too bad :-), but the improvements from no-CAPTCHA have been impressive.
  • The logged-in users do not qualify for the CAPTCHA, but the spam immediately deleted the account, the blacklisted IP and their kind is subtracted from the message.
  • I am ashamed to admit that we did not actually throw off web crawlers (hopefully the client is not reading this :-). To be honest, they probably only added the minimum number of viewpoints every month due to our IP address rule (unless they swarm us with multiple IPs).

Basically, I suggest the following possible improvements. You should, of course, always keep an eye on how they go to see if they work or not.

  • CAPTCHA.
  • Automatic update of the blacklist based on user behavior.
  • The limit on the number of views increases from the same IP addresses.
  • The limit on the number of views increases up to a certain speed.

No scheme you choose is perfect (like our one month rule), but as long as all posts follow the same set of rules, you still get good comparative value. As you said, accuracy should only be an approximation.

+1


source


Suggestions:



  • Move the hit-counting logic from the custom control to the base class of the page.
  • The redesign of the exclusion list will be dynamically updated (i.e. store it in the database or even in an XML file)
  • Record all hits. At regular intervals, set up a cron job with new requests and determine if they are included or excluded. If you make an exception for every hit, every user must wait for the matching logic.
  • Come up with some kind of automatic spam / bots detection algorithm and add them to your blacklist. And / Or subscribe to a third party blacklist.
0


source







All Articles