Cleaning up Cassandra on multiple servers at the same time

We have a large Cassandra cluster of 18 servers (about 5T of data per server)

http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html - We added new nodes following this documentation.

After adding new servers, we started the data cleansing process (clearing nodetool)

The documentation recommends: After starting all new nodes, run a cleanup nodetool on each of the previously existing nodes to remove keys that no longer belong to those nodes. Wait for the cleanup to complete on one node before doing the following)

But the cleanup for one server takes about 2 - 3 days in our case. My question is, can I start cleaning at once on multiple servers, 2 or 3 ...

Or could it lead to data loss?

Additional Information.

We are using cassandra 2.0.13 with vnodes. We also store files in blocks in cassandra.

Replication factor = 3

+3


source to share


1 answer


The cleanup is not linked to other nodes, so it can be safely run in parallel. However, you can run one-on-one to reduce the performance impact, as the cleanup can use a lot of disk I / O.



+4


source







All Articles