Consul - alert if disk is full
The consul demo has disk usage and memory usage checks.
http://demo.consul.io/ui/#/ams2/nodes/ams2-server-1
How could you write a configuration to run the demo? Warning at 10% and critical eras at 5%?
This is what I am trying
{
"check": {
"name": "Disk Util",
"script": "disk_util=$(df -k | grep '/dev/sda1' | awk '{print $5}' | sed 's/[^0-9]*//g' ) | if [ $disk_util > 90 ] ; then echo 'Disk /dev/sda above 90% full' && exit 1; elif [ $disk_util > 80 ] ; then echo 'Disk /dev/sda above 80%' && exit 3; else exit 0; fi",
"interval": "2m"
}
}
Here is the same script but more human readable
disk_util=$(df -k | grep '/dev/sda1' | awk '{print $5}' | sed 's/[^0-9]*//g' ) |
if [ $disk_util > 90 ]
then echo 'Disk /dev/sda above 90% full' && exit 1
elif [ $disk_util > 80 ]
then echo 'Disk /dev/sda above 80%' && exit 3
else exit 0; fi
The validation seems to work, but does not print any text. How can I check if this works and print the output?
source to share
- The result you see is generated by the Nagios check_disk plugin ( https://www.monitoring-plugins.org/doc/man/check_disk.html )
- The "Output" field is filled in by the check stdout. Your check runs cleanly and produces no output. So you don't see anything.
- To add some notes, just add the "notes" field to the check definition as stated in the documentation: https://www.consul.io/docs/agent/checks.html
Your check json file will look something like this:
{
"check": {
"name": "disks",
"notes": "Critical 5%, warning 10% free",
"script": "/path/to/check_disk -w 10% -c 5%",
"interval": "2m"
}
}
source to share
The exit code for your warning condition must be 1 for critical, 2 or higher. (See "Script Testing" at https://www.consul.io/docs/agent/checks.html ), so you probably want to change your output lines.
Your status is "OK" (disk usage <80%) is giving no output, which most likely means you are seeing empty output.
Second, the concept of using nagios plugins rather than rolling your own. Many operating systems will have the nagios-plugins package (s), which are installed by yum / apt.
source to share
The health checks depend on the check exit code. To test if the health checks are being checked by the Consul server, you can write a script that always exits with a 1, and then you will see that the health check failed. Then replace it with a script that always returns 0 and you should see the health check passed in.
If you want to return text to ui add output field to json.
source to share
It seems the consul is only analyzing stdout
, not stderr
. I tested with redirect ( 2>&1
) in the service check file config. It looks like work!
JSON configuration
{
"check": {
"name": "disks",
"notes": "Critical 5%, warning 10% free",
"script": "/path/to/check_disk -w 10% -c 5% 2>&1",
"interval": "2m"
}
}
Output result
source to share