Modeling web analytics segmentation data with Cassandra?

I would like to use Cassandra to analyze my site, which focuses in particular on customer segmentation. What this analytics will do is collect Page View data for each client / visitor, which includes:

username, country, city, gender, timeStamp, source, campaign, pageUrl, timeOnPage

This data should be sliced ​​across all dimensions for easy customer segmentation. For example:

SELECT * Users WHERE Country = US AND PageUrl = " http://mystore.com/bestsettingproduct " AND timeStamp <DateTime.Now AND timeStamp> DateTime.Now - Days.30

OR

SELECT * Users WHERE Campaign = "Latest Email Campaign" AND AND timeStamp <DateTime.Now AND timeStamp> DateTime.Now - Days.30

As I understand it, in Cassandra you can only request a key. Given such dynamic queries where one or more dimensions can be included in a where clause, what would be a good data model?

I am thinking about having multiple tables with the following keys:

table 1 (username, country, city, gender, timeStampWeek, time source, campaign, pageUrl, timeOnPage,

primary key ( (timeStampWeek, country) , city, gender, time source, campaign, pageUrl, timeOnPage));


table 2 (username, country, city, gender, timeStampWeek, time source, campaign, pageUrl, timeOnPage,

primary key ( (timeStampWeek, city) , country, gender, timeStamp, source, campaign, pageUrl, timeOnPage));


table 3 (username, country, city, gender, timeStampWeek, timeStamp, source, campaign, pageUrl, timeOnPage,

primary key ( (timeStampWeek, campaign) , country, city, gender, timeStamp, source, pageUrl, timeOnPage));


And so on for all size combinations? But does that sound wild? Could there be a smarter way to model around these queries?

+3


source to share





All Articles