User participation / activity calculation

I have a site that calculates user participation / activity using multiple MySQL queries.

For a typical user, I'll ask:

How many updates have they made? How many photos did they upload? etc.

These are only basic COUNT queries in the corresponding tables, updates, photos. Then I sum the COUNT values ​​to get the estimate. One JOIN is required per request, and each request takes about 0.0006 seconds, total 10 requests per user, just 0.006 seconds.

While this is not so bad for a single user, I have to calculate this for 100,000 users, for a theoretical processing time of 10 minutes and about 1,000,000 database queries. I seem to be approaching the problem the wrong way and I was wondering if anyone has any ideas?

I thought about keeping the user account in their user account and incrementing it every time they perform a specific action, but this is not that flexible (I can't go back and see how many points were awarded on a specific day for an instance).

Any help is greatly appreciated!

+2


source to share


5 answers


Assuming your tables are structured where each table has a type field user_id

, you can do something like this to get the general "actions" of your users:

SELECT users.user_id, 
       (update_counts.update_count + photo_counts.photo_count) AS activity_count
FROM   users 
    INNER JOIN 
        (
        SELECT updates.user_id AS user_id, 
               COUNT(updates.*) AS update_count
        FROM   updates
        GROUP BY user_id
        ) AS update_counts ON users.user_id = update_counts.user_id
    INNER JOIN 
        (
        SELECT photos.user_id AS user_id, 
               COUNT(photos.*) AS photo_count
        FROM   photos
        GROUP BY user_id
        ) AS photo_counts ON users.user_id = photo_counts.user_id   
GROUP BY users.user_id 

      

Obviously, you can add tables however you like and you can weigh things as you see fit. It should work well enough if you have an index on a field user_id

in each table, although it depends on how big your tables get.



Once your tables are huge, you will need to start caching the Activity_count on the cache table. You can of course cache the values ​​by date if you need to.

If you only want a rough estimate, you can run this query at some regular frequency (say, once a day) and cache the results; it will be less intrusive than writing triggers for each table to update the cache table.

+2


source


You have a table of links user_activity

. Required action_id

, user_id

and a timestamp

. For example, when a user uploads a photo, a record is created with activity_id

= 2 (for the "upload photos" referenced in the table activities

) user_id

and the current timestamp. This is easy to query and fixes long query issues when you have users.



+2


source


If you don't want to make an 11 minute join, I would create a separate table for this purpose, which you insert after every user update.

This table should only contain the username, timestamp, section (from the table), and a unique id from other tables, so you have a backlink for deletes, etc.

0


source


It seems to me that you are trying to optimize before you really need to. If you don’t have 100,000 users, you don’t need to worry about these issues until you need to.

With that said, there is no reason not to optimize, just don't try to convince the problem for a solution you don't already need.

While you may run into minor inconsistencies, you can try to cache the results of each user at login (using memcached) and only update the cache when one of the counters is updated. If the user is very active, it would be more efficient to update every hour or so.

0


source


This might be overkill for your application, but you can always follow the OLAP route. This will allow you to have pre-aggregated measures across multiple dimensions, such as users and time intervals. This gives you a flexible structure for different reporting needs. SQL Server Analysis Services has proven itself well for our company.

0


source







All Articles