How do I get the last 6 month data on a timestamp column using a cassandra query?
How do I get the last 6 months data compared to a column timestamp
using a cassandra query? I need to get the entire account statement that is from the last 3/6 months compared to updatedTime(TimeStamp column)
and CurrentTime
. For example, in SQL, we use a function DateAdd()
to do this. I don't know how to do this in cassandra. If anyone knows, please answer. Thanks in Advance.
source to share
Cassandra 2.2 and later allows users to define functions (UDTs) that can be applied to data stored in a table as part of a query result.
You can create your own method if using Cassandra 2.2 and later UDFs
CREATE FUNCTION monthadd(date timestamp, month int)
CALLED ON NULL INPUT
RETURNS timestamp
LANGUAGE java
AS $$java.util.Calendar c = java.util.Calendar.getInstance();c.setTime(date);c.add(java.util.Calendar.MONTH, month);return c.getTime();$$
This method takes two parameters
- datestamp: the date from which you want to add or subtract the number of months
- month int: the day of the month you want or add (+) subtract (-) from the date
Returns a date stamp
Here's how you can use it:
SELECT * FROM ttest WHERE id = 1 AND updated_time >= monthAdd(dateof(now()), -6) ;
Here the monthAdd method subtracts 1 mont from the current timestamp, so this request will contain the last month data
Note. Custom functions are disabled by default in cassandra.yaml - set enable_user_defined_functions = true to enable if you are aware of security risks
source to share
In cassandra, you need to create queries ahead of time.
Also keep in mind that you may have to log data depending on the number of accounts you have over a period of time.
If your entire database doesn't contain more than 100,000 records, you are fine, just defining one common section, let's say it all. But usually people have a lot of data that just goes into the bucket with the name of the month, week, hour. It depends on the number of inserts you inserted.
The reason for creating buckets is that each node can find a section using the section key. This is the first part of the definition primary key
. Then, at each node, the data is sorted by the second information you pass to primary key
. After sorting the data, you can "scan" it, ie. You can get them by specifying the timestamp parameter.
Let's say you want to get accounts from the last 6 months and that you keep all accounts from one month in the same bucket.
The schema can be something like strings:
create table accounts {
month text,
created_time timestamp,
account text,
PRIMARY KEY (month, created_time)
}
Typically you will do this at the application level, merging requests is an anti-pattern, but this is ok for fewer requests:
select account
from accounts
where month = '201701';
Output:
'201702'
'201703'
etc.
If you have something really simple, let's say 100,000 records are expected, you can use the above scheme and just do something like:
create table accounts {
bucket text,
created_time timestamp,
account text,
PRIMARY KEY (bucket, created_time)
}
select account
from accounts
where bucket = 'some_predefined_name'
and created_time > '2016-10-04 00:00:00'
Once again, as a wrapper, with cassandra, you always need to prepare structures for the access template you intend to use.
source to share