Failover and replication to a 2-node Cassandra cluster

Question

Failover and replication to a 2-node Cassandra cluster

I am running KairosDB on a 2-node Cassandra cluster, RF = 2, Write CL = 1, Read CL = 1. If 2 nodes are alive, the client sends half of the data to node 1 (for example, the metric from METRIC_1 is METRIC_5000) and the other half of the data - node 2 (for example, metric from METRIC_5001 to METRIC_10000). Ideally, each node always has a copy of all data. But if one node is dead, the client sends all data to the live node.

The client started sending data to the cluster. After 30 minutes, I shutdown node 2 for 10 minutes. During this 10 minute period, the client sent all data to node 1 as expected. After that, I restarted node 2 and the client continued to send data to 2 nodes correctly. An hour later, I stopped a client.

I wanted to check if the data that was sent to node 1 at node 2 was automatically replicated to node 2 or not. To do this, I disabled node 1 and requested data during the time node 2 was dead from node 2, but nothing came back. This made me think that the data was not being played from node 1 to node 2. I posted a question Does Cassandra do "late" replication when node over and over again? ... It seems that the data was replicated automatically, but it was so slow.

I expect the data to be the same on both servers (for backup purposes). This means that data sent to the system when node 2 is dead should be replicated from node 1 to node 2 automatically after node 2 becomes available (since RF = 2).

I have several questions:

1) Is replication really slow? Or have I configured something wrong?

2) If the client is sending half of the data to every node, as in this question, I think it is possible to lose data (for example, node 1 is receiving data from the client and node 1 is replicating data to node 2, it suddenly crashes). I'm right?

3) If I am correct in 2) I am going to do this: the client sends all data to both nodes. This might solve 2) and also takes advantage of replication if one node is dead and available later. But I'm wondering if this will lead to duplicate data since both nodes receive the same data. Are there any problems here?

Thank!

+3

cassandra cassandra-2.0 kairosdb

duong_dajgja 03 Aug 15 at 14:34

source to share

1 answer

Loic · Accepted Answer · 2015-08-03T15:59:02+0000

Can you check the hinted_handoff_enabled value in the cassandra.yaml config file?

For your question: Yes, you can lose data in some cases, until replication is fully achieved, Cassandra does not do the final replication - there are three mechanisms.

Provided hands http://docs.datastax.com/en/cassandra/2.2/cassandra/operations/opsRepairNodesHintedHandoff.html
Repair - http://docs.datastax.com/en/cassandra/2.0/cassandra/tools/toolsRepair.html
Read Repair - it may not have much impact on your use case - http://wiki.apache.org/cassandra/ReadRepair

AFAIK, if you are using a version greater than 0.8, the intended handovers should duplicate data after a node reboot without the need for repair, unless the data is too old (it shouldn't be for 10 minutes), I don't know why these handovers that weren't sent to your node replica when it was restarted deserve investigation.

Otherwise, when restarting node, you can force Cassandra to make sure the data is consistent by doing a repair (like doing a nodetool repair).

From your description, I get the feeling that you are confused between the coordinator node and the node that receives the data (even if the two nodes store data, the difference is important).

BTW, what is the client behavior with the metric enclosed between node 1 and node 2 that you describe? Neither KairosDB nor Cassandra work like this, is it your own client that sends metrics to different KairosDB instances?

Cassandra splitting is not done by metric name, but by row key (section key is for sure, but it's the same with kairosDB). Thus, every 3 weeks data for each unique series will be associated with a hash-based token, this token will be used for crawling / replication in the cluster. KairosDB is capable of interacting with multiple nodes and will rotate cyclically between them as coordinator nodes.

Hope this helps.

Failover and replication to a 2-node Cassandra cluster

More articles: