Cassandra: Paste with Older Timestamp
(Cassandra 2.0.9 using CQL)
I accidentally updated a row in a table that was running its own timestamp (100 * specific sequence number). Now, since my timestamp is the current time, none of the updates are working. I understand why this is so, but I am trying to recover from it. I'm lucky that I can remove these lines.
I set gc_grace_seconds to 0 and ran delete from table where key = primarykey to delete rows. After that I used nodetool flush and nodetool compact on each node to trigger delete and get the resulting tombstones compressed and erasable. Then I ran into gc_grace_seconds before 10 days and tried to insert into a string with the same key but using timestamp 1.
This does not work. Just wondering if anyone made a similar mistake and worked around it?
source to share
I thought I would try this exercise.
aploetz@cqlsh:presentation> SELECT * FROm bladerunners WHERE id='B26354';
id | data | name | ts | type
--------+---------------------+--------------+--------------------------+--------------
B26354 | Filed and monitored | Rick Deckard | 2015-02-16 12:00:03-0600 | Blade Runner
(1 rows)
Below is an example of saving data using cassandra-cli
:
[default@presentation] get bladerunners[B26354];
=> (name=, value=, timestamp=1427744637894310)
=> (name=data, value=46696c656420616e64206d6f6e69746f7265642e, timestamp=1427744637894310)
=> (name=name, value=5269636b204465636b617264, timestamp=1427744637894310)
=> (name=ts, value=0000014b938c09a2, timestamp=1427744637894310)
=> (name=type, value=426c6164652052756e6e6572, timestamp=1427744637894310)
Returned 5 results.
Elapsed time: 7.67 msec(s).
Now I will remove the column data
for this row by creating a tombstone:
DELETE data FROM bladerunners WHERE id='B26354';
When I SELECT with tracing on
, I see the column displays "null" and I have a tombstone.
aploetz@cqlsh:presentation> SELECT * FROM bladerunners WHERe id='B26354';
id | data | name | ts | type
--------+------+--------------+--------------------------+--------------
B26354 | null | Rick Deckard | 2015-02-16 12:00:03-0600 | Blade Runner
...
Read 1 live and 1 tombstoned cells [SharedPool-Worker-2] | 2015-06-10 08:42:25.858000 | 192.168.23.129 | 2173
So I'll set the table bladerunners
gc_grace_seconds
to be zero:
ALTER TABLE bladerunners WITH gc_grace_seconds=0;
From command line (Linux) I will hide and compress my space presentation
:
aploetz@dockingBay94:/local/dsc-cassandra-2.1.4$ bin/nodetool flush
aploetz@dockingBay94:/local/dsc-cassandra-2.1.4$ bin/nodetool compact presentation
When I SELECT with tracing on
, I see that the column data
is still "null", but now the headstone is gone.
Now I will insert the column again data
with time stamp 1:
INSERT INTO bladerunners (id, data) VALUES ('B26354','Filed and monitored') USING TIMESTAMP 1;
When requested with, cassandra-cli
this is now what I see:
[default@presentation] get bladerunners[B26354];
=> (name=, value=, timestamp=1427744637894310)
=> (name=data, value=46696c656420616e64206d6f6e69746f726564, timestamp=1)
=> (name=name, value=5269636b204465636b617264, timestamp=1427744637894310)
=> (name=ts, value=0000014b938c09a2, timestamp=1427744637894310)
=> (name=type, value=426c6164652052756e6e6572, timestamp=1427744637894310)
Returned 5 results.
Elapsed time: 4.7 msec(s).
Note that the column is data
now timestamped.
Try the query with help tracing on
and see if your tombstones are really gone. Also, check the table with cassandra-cli
to see how the timestamps go. Let me know if you need clarification on any of these steps.
NOTE: I was just showing flush / compact as part of an example or exercise. I have to say that DataStax advises users to avoid manually nodetool compact
if at all possible.
source to share
The answer from BryceAtNetwork23 is most likely the "most correct" one, it comes with a warning that you must run nodetool flush and nodetool compact on each node (or it looks like from my testing of Cassandra's 3 node cluster). This can take quite a long time.
As another solution (for those who visit here in the future) you can get the lines that you eventually want to delete.
cqlsh> select id from example_table where some_field = -1 allow filtering;
Then dump the file into a file used to remove the specified lines.
cat cassandra-output | sort | uniq | grep '^ ' | grep -v id | gawk '{ print $1 }' >just-ids.txt
cat just-ids.txt | gawk '{ print "delete from example_table where id='\''"$1"'\'' and some_field = -1;" } >remove.cql
cqlsh ... -f remove.cql
I found Cassandra Dump , which I then use to dump the remaining data. After that, I can dump and recreate the table (and indexes if needed) and then reload the data.
Finally, I can insert the lines that I had, but with new timestamps:
cat just-ids.txt | gawk '{ print "insert into example_table (id,some_field) values('\''"$1"'\'', -1) using timestamp 0;" }' >repair.cql
cqlsh ... -f repair.cql
If you do this multiple times, you can of course just drop the table and reload it with data after repair. This is a faster solution when you reboot again than doing flush / compact.
source to share