Cassandra: Paste with Older Timestamp

(Cassandra 2.0.9 using CQL)

I accidentally updated a row in a table that was running its own timestamp (100 * specific sequence number). Now, since my timestamp is the current time, none of the updates are working. I understand why this is so, but I am trying to recover from it. I'm lucky that I can remove these lines.

I set gc_grace_seconds to 0 and ran delete from table where key = primarykey to delete rows. After that I used nodetool flush and nodetool compact on each node to trigger delete and get the resulting tombstones compressed and erasable. Then I ran into gc_grace_seconds before 10 days and tried to insert into a string with the same key but using timestamp 1.

This does not work. Just wondering if anyone made a similar mistake and worked around it?

+3


source to share


2 answers


I thought I would try this exercise.

aploetz@cqlsh:presentation> SELECT * FROm bladerunners WHERE id='B26354';
 id     | data                | name         | ts                       | type
--------+---------------------+--------------+--------------------------+--------------
 B26354 | Filed and monitored | Rick Deckard | 2015-02-16 12:00:03-0600 | Blade Runner

(1 rows)

      

Below is an example of saving data using cassandra-cli

:

[default@presentation] get bladerunners[B26354];
=> (name=, value=, timestamp=1427744637894310)
=> (name=data, value=46696c656420616e64206d6f6e69746f7265642e, timestamp=1427744637894310)
=> (name=name, value=5269636b204465636b617264, timestamp=1427744637894310)
=> (name=ts, value=0000014b938c09a2, timestamp=1427744637894310)
=> (name=type, value=426c6164652052756e6e6572, timestamp=1427744637894310)
Returned 5 results.
Elapsed time: 7.67 msec(s).

      

Now I will remove the column data

for this row by creating a tombstone:

DELETE data FROM bladerunners WHERE id='B26354';

      

When I SELECT with tracing on

, I see the column displays "null" and I have a tombstone.

aploetz@cqlsh:presentation> SELECT * FROM bladerunners WHERe id='B26354';

 id     | data | name         | ts                       | type
--------+------+--------------+--------------------------+--------------
 B26354 | null | Rick Deckard | 2015-02-16 12:00:03-0600 | Blade Runner

...

Read 1 live and 1 tombstoned cells [SharedPool-Worker-2] | 2015-06-10 08:42:25.858000 | 192.168.23.129 |           2173

      

So I'll set the table bladerunners

gc_grace_seconds

to be zero:

ALTER TABLE bladerunners WITH gc_grace_seconds=0;

      



From command line (Linux) I will hide and compress my space presentation

:

aploetz@dockingBay94:/local/dsc-cassandra-2.1.4$ bin/nodetool flush
aploetz@dockingBay94:/local/dsc-cassandra-2.1.4$ bin/nodetool compact presentation

      

When I SELECT with tracing on

, I see that the column data

is still "null", but now the headstone is gone.

Now I will insert the column again data

with time stamp 1:

INSERT INTO bladerunners (id, data) VALUES ('B26354','Filed and monitored') USING TIMESTAMP 1;

      

When requested with, cassandra-cli

this is now what I see:

[default@presentation] get bladerunners[B26354];
=> (name=, value=, timestamp=1427744637894310)
=> (name=data, value=46696c656420616e64206d6f6e69746f726564, timestamp=1)
=> (name=name, value=5269636b204465636b617264, timestamp=1427744637894310)
=> (name=ts, value=0000014b938c09a2, timestamp=1427744637894310)
=> (name=type, value=426c6164652052756e6e6572, timestamp=1427744637894310)
Returned 5 results.
Elapsed time: 4.7 msec(s).

      

Note that the column is data

now timestamped.

Try the query with help tracing on

and see if your tombstones are really gone. Also, check the table with cassandra-cli

to see how the timestamps go. Let me know if you need clarification on any of these steps.

NOTE: I was just showing flush / compact as part of an example or exercise. I have to say that DataStax advises users to avoid manually nodetool compact

if at all possible.

+3


source


The answer from BryceAtNetwork23 is most likely the "most correct" one, it comes with a warning that you must run nodetool flush and nodetool compact on each node (or it looks like from my testing of Cassandra's 3 node cluster). This can take quite a long time.

As another solution (for those who visit here in the future) you can get the lines that you eventually want to delete.

cqlsh> select id from example_table where some_field = -1 allow filtering;

      

Then dump the file into a file used to remove the specified lines.

cat cassandra-output | sort | uniq | grep '^ ' | grep -v id | gawk '{ print $1 }' >just-ids.txt
cat just-ids.txt | gawk '{ print "delete from example_table where id='\''"$1"'\'' and some_field = -1;" } >remove.cql
cqlsh ... -f remove.cql

      



I found Cassandra Dump , which I then use to dump the remaining data. After that, I can dump and recreate the table (and indexes if needed) and then reload the data.

Finally, I can insert the lines that I had, but with new timestamps:

cat just-ids.txt | gawk '{ print "insert into example_table (id,some_field) values('\''"$1"'\'', -1) using timestamp 0;" }' >repair.cql
cqlsh ... -f repair.cql

      

If you do this multiple times, you can of course just drop the table and reload it with data after repair. This is a faster solution when you reboot again than doing flush / compact.

+1


source







All Articles