How do I properly reboot my CoreOS cluster?

Question

How do I properly reboot my CoreOS cluster?

I would like to reboot my CoreOS cluster nodes one by one, as I've read a lot of bad things rebooting all nodes at once, not very good (etcd, ceph was unable to keep quorum, etc.). What is the correct way to do this, other than logging into each machine manually and issuing a command reboot

?

Is there a general way to reboot n nodes in a cluster, wait for them to complete, and then another set of n nodes until all nodes are rebooted?

Thank.

+3

cluster-computing coreos

Jimmy chu 25 nov. 14 at 2:34 am

source to share

2 answers

emassa · Answer 1 · 2014-11-25T11:25:40+0000

In the cloud-config.yaml file you can add:

coreos:
  update:
    reboot-strategy: etcd-lock

which means the machines in your cluster will get locked before rebooting to ensure that no more than 1 machine is rebooted each time. See the documentation for more information: https://coreos.com/docs/cluster-management/setup/update-strategies/

Robert reiz · Answer 2 · 2015-11-23T08:26:10+0000

Locksmith is a daemon for rebooting CoreOS node. I recommend choosing the etcd-lock reboot strategy:

coreos:
  update:
    reboot-strategy: etcd-lock

By default this will reboot cluster 1 to 1. I am using fleetctl to remotely manage my CoreOS cluster. This script will send a reboot signal to all computers in the cluster:

#!/bin/bash -x

for machine in $(fleetctl list-machines --no-legend --full | awk '{ print $1;}'); do
        fleetctl ssh $machine "sudo locksmithctl reboot"
done

If your reboot strategy is etcd-lock, the nodes will not reboot immediately. They will reboot from 1 to 1 until the entire cluster is rebooted.

How do I properly reboot my CoreOS cluster?

More articles: