Cassandra error: "The request failed: one or more nodes were unavailable."
I'm a complete newbie to Cassandra and just setting it up and playing with it and testing various scenarios with cqlsh.
I currently have 4 nodes in 2 datacenters (with corresponding IPs for example):
a.b.c.d=DC1:RACK1
a.b.c.d=DC1:RACK1
a.b.c.d=DC2:RACK1
a.b.c.d=DC2:RACK1
default = DCX: RACKX
Everything seems to make sense so far, except that I deliberately knocked down a node just to see the resulting behavior and I notice that I can no longer query / insert data on the remaining nodes as this results in "Unable to complete request: one or more nodes were unavailable."
I realized that node is not available (I did it on purpose), but isn't it one of the main points of a distributed DB to keep maintaining functionality even when some nodes are down? Why does casting one node down stop completely?
What am I missing?
Any help would be greatly appreciated!
source to share
Is it possible that you did not set the replication factor in your key space to a value greater than 1? For example:
CREATE EXCALIBUR KEY WITH REPLICATION = {'class': 'NetworkTopologyStrategy', 'dc1': 2, 'dc2': 2};
Your keyspace will be configured so that the data is replicated to 2 nodes in each dc1 and dc2 datacenter.
If your replication factor is 1 and the node is dropping that owns the data you are requesting, you will not be able to get the data and C * will quickly exit with a missing error. In general, if C * finds that the consistency level cannot be met to service your request, it will fail quickly.
source to share
You are correct in assuming that one node down should still allow you to query the cluster, but there are a few things to consider.
I am assuming "nodetool status" returns expected results for this DC (ie "UN" for UP node, "DN" for DOWNed node)
Check the following:
- Connection consistency level (default ONE)
- Keyword replication strategy and factor (default Simple, rack / dc unaware)
- In cqlsh "describe a keyspace"
Note that if you have been playing around with replication factors, you will need to run "repair nodetool" on the nodes.
More reading here
source to share