Modeling web analytics segmentation data with Cassandra?

Question

Modeling web analytics segmentation data with Cassandra?

I would like to use Cassandra to analyze my site, which focuses in particular on customer segmentation. What this analytics will do is collect Page View data for each client / visitor, which includes:

username, country, city, gender, timeStamp, source, campaign, pageUrl, timeOnPage

This data should be sliced across all dimensions for easy customer segmentation. For example:

SELECT * Users WHERE Country = US AND PageUrl = " http://mystore.com/bestsettingproduct " AND timeStamp <DateTime.Now AND timeStamp> DateTime.Now - Days.30

OR

SELECT * Users WHERE Campaign = "Latest Email Campaign" AND AND timeStamp <DateTime.Now AND timeStamp> DateTime.Now - Days.30

As I understand it, in Cassandra you can only request a key. Given such dynamic queries where one or more dimensions can be included in a where clause, what would be a good data model?

I am thinking about having multiple tables with the following keys:

table 1 (username, country, city, gender, timeStampWeek, time source, campaign, pageUrl, timeOnPage,

primary key ( (timeStampWeek, country) , city, gender, time source, campaign, pageUrl, timeOnPage));

table 2 (username, country, city, gender, timeStampWeek, time source, campaign, pageUrl, timeOnPage,

primary key ( (timeStampWeek, city) , country, gender, timeStamp, source, campaign, pageUrl, timeOnPage));

table 3 (username, country, city, gender, timeStampWeek, timeStamp, source, campaign, pageUrl, timeOnPage,

primary key ( (timeStampWeek, campaign) , country, city, gender, timeStamp, source, pageUrl, timeOnPage));

And so on for all size combinations? But does that sound wild? Could there be a smarter way to model around these queries?

+3

cassandra data-modeling cassandra-2.0 datastax

Milen Kovachev June 22. 15 at 16:44

source to share