How do I properly reboot my CoreOS cluster?
I would like to reboot my CoreOS cluster nodes one by one, as I've read a lot of bad things rebooting all nodes at once, not very good (etcd, ceph was unable to keep quorum, etc.). What is the correct way to do this, other than logging into each machine manually and issuing a command reboot
?
Is there a general way to reboot n nodes in a cluster, wait for them to complete, and then another set of n nodes until all nodes are rebooted?
Thank.
source to share
In the cloud-config.yaml file you can add:
coreos:
update:
reboot-strategy: etcd-lock
which means the machines in your cluster will get locked before rebooting to ensure that no more than 1 machine is rebooted each time. See the documentation for more information: https://coreos.com/docs/cluster-management/setup/update-strategies/
source to share
Locksmith is a daemon for rebooting CoreOS node. I recommend choosing the etcd-lock reboot strategy:
coreos:
update:
reboot-strategy: etcd-lock
By default this will reboot cluster 1 to 1. I am using fleetctl to remotely manage my CoreOS cluster. This script will send a reboot signal to all computers in the cluster:
#!/bin/bash -x
for machine in $(fleetctl list-machines --no-legend --full | awk '{ print $1;}'); do
fleetctl ssh $machine "sudo locksmithctl reboot"
done
If your reboot strategy is etcd-lock, the nodes will not reboot immediately. They will reboot from 1 to 1 until the entire cluster is rebooted.
source to share