Modeling web analytics segmentation data with Cassandra?
I would like to use Cassandra to analyze my site, which focuses in particular on customer segmentation. What this analytics will do is collect Page View data for each client / visitor, which includes:
username, country, city, gender, timeStamp, source, campaign, pageUrl, timeOnPage
This data should be sliced across all dimensions for easy customer segmentation. For example:
SELECT * Users WHERE Country = US AND PageUrl = " http://mystore.com/bestsettingproduct " AND timeStamp <DateTime.Now AND timeStamp> DateTime.Now - Days.30
OR
SELECT * Users WHERE Campaign = "Latest Email Campaign" AND AND timeStamp <DateTime.Now AND timeStamp> DateTime.Now - Days.30
As I understand it, in Cassandra you can only request a key. Given such dynamic queries where one or more dimensions can be included in a where clause, what would be a good data model?
I am thinking about having multiple tables with the following keys:
table 1 (username, country, city, gender, timeStampWeek, time source, campaign, pageUrl, timeOnPage,
primary key ( (timeStampWeek, country) , city, gender, time source, campaign, pageUrl, timeOnPage));
table 2 (username, country, city, gender, timeStampWeek, time source, campaign, pageUrl, timeOnPage,
primary key ( (timeStampWeek, city) , country, gender, timeStamp, source, campaign, pageUrl, timeOnPage));
table 3 (username, country, city, gender, timeStampWeek, timeStamp, source, campaign, pageUrl, timeOnPage,
primary key ( (timeStampWeek, campaign) , country, city, gender, timeStamp, source, pageUrl, timeOnPage));
And so on for all size combinations? But does that sound wild? Could there be a smarter way to model around these queries?
source to share
No one has answered this question yet
Check out similar questions: