Interrupt available playbook if host is unavailable

I'm wondering if there is any decent way to require all hosts to have a set of tasks run to achieve real availability?

I'm currently trying to get it to handle the update, which can be painful if they don't all of the relevant nodes update synchronously.

+3


source to share


3 answers


I was about to ask a question when I saw this. The answer as suggested by Duncan doesn't work, at least in my case. the host is unreachable. All my playbooks define max_fail_percentage of 0.

But the carefree will happily complete whatever tasks on hosts it can reach and take action. I really wanted any of the hosts to be unreachable, don't do any tasks.

What I found was simple but can be considered a hacky solution and open to better answers.

Starting with the first step as part of running playbooks, ansible gathers facts for all hosts. And in case the host is not available, it will fail. I am writing a simple game at the very beginning of my play that will use fact. And if the host is unreachable, the task will fail with "Undefined variable error". The challenge is just a dummy and will always pass if all hosts are reachable.

See below for my example:



- name: Check Ansible connectivity to all hosts
  hosts: host_all
  user: "{{ remote_user }}"
  sudo: "{{ sudo_required }}"
  sudo_user: root
  connection: ssh # or paramiko
  max_fail_percentage: 0
  tasks:
    - name: check connectivity to hosts (Dummy task)
      shell: echo " {{ hostvars[item]['ansible_hostname'] }}"
      with_items: groups['host_all']
      register: cmd_output

    - name: debug ...
      debug: var=cmd_output

      

If the host is not available, you will receive an error message like below:

TASK: [c.. ***************************************************** 
fatal: [172.22.191.160] => One or more undefined variables: 'dict object'    has no attribute 'ansible_hostname' 
fatal: [172.22.191.162] => One or more undefined variables: 'dict object' has no attribute 'ansible_hostname'

FATAL: all hosts have already failed -- aborting

      

Note. If your host group is not named host_all

, you must modify the dummy task to reflect this name.

+4


source


You can combine any_errors_fatal: true

or max_fail_percentage: 0

with gather_facts: false

and then run the task, which will fail if the host is offline. Something like this at the top of the tutorial should do what you need:

- hosts: all
  gather_facts: false
  max_fail_percentage: 0
  tasks:
    - action: ping

      



The bonus is that this also works with the option -l SUBSET

to limit host matching.

+2


source


You can add max_fail_percentage

to your playlist - something like this:

- hosts: all_boxes
  max_fail_percentage: 0
  roles:
    - common
  pre_tasks:
    - include: roles/common/tasks/start-time.yml
    - include: roles/common/tasks/debug.yml

      

This way, you can decide how many setbacks you want to suffer. Here is a section of the relevant Ansible Documentation :

By default, Ansible will continue to execute actions as long as there are hosts in the group that have not yet triggered. In some situations, such as with the rolling update described above, it may be desirable to abort the game when a certain crash threshold has been reached. To do this, starting with version 1.3, you can set the maximum rejection rate for playback as follows:

  • hosts: webservers max_fail_percentage: 30 serial: 10 In the above example, if more than 3 of the 10 servers in the group were to fail, the rest of the game would be aborted.

Note. The set of percentages must be exceeded, not equal. For example, if the serial number is set to 4 and you want the task to abort when 2 of the systems failed, the percentage should be set to 49, not 50.

0


source







All Articles