Monitor UPSs with Nagios uisng Custom Nagios Plugins

I have a couple of CyberPwoer UPSs and to monitor them through Nagios I’ve been uisng some custom Nagios plugins I’ve written sometime back.

CyberPower UPSs can be monitored uisng the pwrstat utility uisng their PowerPanel software but it will not work with other UPS brands. I have one UPS connected to host running Debian so the pwrstat command can be used for that. But I have another UPS which needs t be monitored through a Raspberry Pi (above pic) which is near it and unfortunately the CyberPowers’s PowerPanel  doesn’t support ARM bases systems yet (as of today). Therefore, I’m uisng the Network UPS Tools (NUT) and monitoring that second UPS uisng the nut-server. If your UPS is not CyberPower you can check if your UPS supports the NUT utility here. NUT will run on any Linux based system regardless, if it’s x86/64 or ARM.

Let’s first get to how we could setup a UPS to be monitored thru Nagios uisng NUT

This is how the monitoring works :

  1. The Linux host (I have a raspberry PI – any Linux host should work) which the USP is connected (by USB) must have nut-server and nut-driver installed.
  2. The custom Nagios plugins will use the nuts command line utility to fetch information about the UPS.
  3. We add services in the Nagios config in the Nagios server to call these custom plugins.

The first steps here would be to install nut-server and nut-driver on the host. There are some good links provided by NUT website (Links here), so I will not go through the process of installing nut-server and nut-driver.

After getting NUT installed you can get the status of the ups by running this command as sudo.

sudo upsc <ups-name>@localhost

 

From the values fetched from the UPS, I’m interested in tracking four parameters.

  1. Battery charge (Percentage)
  2. Battery runtime (Seconds)
  3. Input voltage (Volts)
  4. Load (Percentage)

To fetch these values I’ve written several customer Nagios plugins which calls the above command and extracts the value associated with the parameter. These are linked below in my Github.

Fetch Battery charge percentage (Capacity) : check_nut_ups_capacity.sh

Fetch Battery runtime in minutes  : check_nut_ups_runtime.sh

Fetch Input Voltage in Volts : check_nut_ups_involtage.sh

Fetch percentage load on UPS : check_nut_ups_load.sh

Let us look in to one plugin as an example : check_nut_ups_capacity

# NUT (Network UPS Tools) drivers and NUT server must be installed in the host for this plug-in to work.
# details : https://networkupstools.org/
# /home/user/ups/restart_nuts.sh - must restart the NUT drivers and the server (in that order) which will be triggered in case the plug-in is unable to read the values in it's initial try.


#!/bin/bash
STATE_OK=0 
STATE_WARNING=1 
STATE_CRITICAL=2 
STATE_UNKNOWN=3 
STATE_DEPENDENT=4 
blank=""
NUT_RESTART="/home/user/ups/restart_nuts.sh"
usage1="Usage: $0 -H <host> -u <ups> -w <warn> -c <crit>" 
exitstatus=$STATE_WARNING #default 
while test -n "$1"; do
    case "$1" in
        -c)
            crit=$2
            shift
            ;;
        -w)
            warn=$2
            shift
            ;;
        -u)
            ups=$2
            shift
            ;;
        -h)
            echo $usage1;
            echo 
            exit $STATE_UNKNOWN
	    ;;
	-H)
            host=$2
            shift
            ;;
        *)
            echo "Unknown argument: $1"
            echo $usage1;
	    echo
            exit $STATE_UNKNOWN
            ;;
    esac
    shift 
done 

value=`upsc $ups@$host battery.charge 2>&1 | grep -v '^Init SSL'`

#conversion to integer
value=${value/\.*}

#if blank, warning
if [[ $value == $blank ]]; then
        echo UPS WARNING - Battery Capacity  = $value%	 
	exit $STATE_WARNING; 
fi

# value>warn
if [ $value -gt $warn ]; then
        echo UPS OK - Battery Capacity  = $value%
        exit $STATE_OK;
fi

#c<value<=w
if [ $value -gt $crit ]; then 
        echo UPS WARNING - Battery Capacity  = $value%
	exit $STATE_WARNING; 
fi

#value<c 
if [ $value -le $crit ]; then
        echo UPS CRITICAL - Battery Capacity  = $value% 
	exit $STATE_CRITICAL; 
fi

sudo $NUT_RESTART

value=`upsc $ups@$host battery.charge 2>&1 | grep -v '^Init SSL'`

#conversion to integer
value=${value/\.*}

# if blank, warning
if [[ $value == $blank ]]; then
        echo UPS WARNING - Battery Capacity  = $value%
        exit $STATE_WARNING;
fi

# value>warn
if [ $value -lt $warn ]; then
        echo UPS OK - $ups@$host : Battery Capacity  = $value%
        exit $STATE_OK;
fi

#c<value<=w
if [ $value -lt $crit ]; then
        echo UPS WARNING - $ups@$host : Battery Capacity  = $value%
        exit $STATE_WARNING;
fi

#value<c
if [ $value -ge $crit ]; then
        echo UPS CRITICAL - $ups@$host : Battery Capacity  = $value%
        exit $STATE_CRITICAL;
fi

echo UPS UNKNOWN - $ups@$host : Battery Capacity  = $value%
exit $STATE_UNKNOWN;

I’ve noticed from time to time the ‘upsc’  command gives an error giving a message like “Data for UPS is stale” and I’ve found no direct way to fix this other than restarting the nut-driver, nut-server and UPS drive controller (in that order) before retrying the command. To make sure this happens automatically I stored a bash script for doing theses restart operations and I call it within the Nagios plugin (see above line in bold text  : NUT_RESTART=”/home/user/ups/restart_nuts.sh”) above if it fails to fetch the values initially. After calling the restart scripts it tried to fetches the values again and this must work without any issue as for my experience.

The restart script is here :

#!/bin/bash
sudo systemctl restart nut-driver
sudo systemctl restart nut-server
sudo upsdrvctl start

 Copy the Nagios plugins to the Nagios plugin folder (“/usr/local/nagios/libexec/” in my case) and change the “NUT_RESTART” script path to the bash script you created (don’t forget to mark it as an executable uisng : “chmod +x <restart_script_path>.sh”). Then, we can test it like below :

$ ./check_ups_nut_capacity.sh -H <host> -u <ups> -w <warning_threshold> -c <critical_treshhold>

Then we must make an entry in nrpe.cfg file to register this plugin, It will be something like this :

command[check_ups_battery]=/usr/bin/sudo /usr/local/nagios/libexec/check_ups_nut_capacity -H localhost -u ups3-cp425 -w 50 -c 25

Here we define the Nagios warning threshold to be below 50% capacity and critical threshold to be below 25% of the capacity.

NOTE: For the nrpe daemon to run this plugin as sudo user you might have to add it to the visudo (sudoers) file. Check this article for morr info.

After restarting nrpe daemon (systemctl restart nrpe) we can go to the Nagios server and register this UPS check as a monitored service.

Registering the UPS monitoring as service in the Nagios server

In the host that runs the Nagios sever, I usually define the UPS as a host (host notifications disabled becuase I use a separate a config file to monitor the Pi’s memory, disk, uptime, etc…) and define the UPS checks as services. Notifications are enabled only for services because I don’t want to get the Pi’s up/down notifications from this config file.

 

define host {
        use                             linux-server
        host_name                       UPS3-CP425
        alias                           UPS3-CP425
        address                         <IP-Address-of-the-Pi>
        max_check_attempts              5
        check_period                    24x7
        notification_interval           30
        notification_period             24x7
        notifications_enabled           0
}

define service {
        use                             generic-service
        host_name                       UPS3-CP425
        service_description             Battery Charge pct
        check_command                   check_nrpe!check_ups_battery
        contacts                        adminemail, admintext
}
define service {
        use                             generic-service
        host_name                       UPS3-CP425
        service_description             Battery Runtime m
        check_command                   check_nrpe!check_ups_runtime
        contacts                        adminemail, admintext
}
define service {
        use                             generic-service
        host_name                       UPS3-CP425
        service_description             Input Voltage v
        check_command                   check_nrpe!check_ups_voltage
        contacts                        adminemail, admintext
}
define service {
        use                             generic-service
        host_name                       UPS3-CP425
        service_description             Load pct
        check_command                   check_nrpe!check_ups_load
        contacts                        adminemail, admintext
}

 After saving the above config file (make sure to put the IP adress of your host) and restarting the Nagios service ($ systemctl restart nagios) we can manually check if the Nagios host can fetch the UPS info uisng the nrpe daemon running on the pi. “check_nrpe” plugin will be in the plugin location of the Nagios host. We call it passing the IP Address of the Pi and the command name defined in the nrpe.cfg file of the Pi (see above).

Hence, we get the response from the UPS. Now this service will be visible in the Nagios web, also we can check it through the aNag app. I’ve added other three checks also uisng the same steps above. You can modify the above custom Nagios plugins to fetch any information that’s listed by the ‘upsc’ command.

Now let’s get to how we could setup a UPS to be monitored thru the CyberPower power panel utility

This is easier than the NUT utility method explained above because a restart script won’t be needed as I haven’t heard or noticed anywhere that the “pwrstat” command line utility used here giving trouble. It just works!

This is how the monitoring works :

  1. The host (I have Debian host for this, any linux host should work) which the USP is connected (by USB) that has the CyberPower power panel utility installed.
  2. The custom Nagios plugins will use the “pwrstat” command line utility to fetch information about the UPS.
  3. We add services in the Nagios config in the Nagios server to call these custom plugins.

The software and documentation is provided here in CyberPower’s web page to get the pwrstat utility running in the host system. Here’s the output from the pwrstat utility.

Here also I have written four custom Nagios plugins to fetch four parameters from the above output. Links to Github given below. These plugins could be changed if you want to fetch something else from the above output.

Fetch Battery charge percentage (Capacity) : check_cyberpower_ups_capacity

Fetch Battery runtime in minutes  : check_cyberpower_ups_runtime

Fetch Input Voltage in Volts : check_cyberpower_ups_involtage

Fetch percentage load on UPS : check_cyberpower_ups_load

We’ll take the “check_cyberpower_ups_capacity” plugin for example. Like we did with NUT, we must copy this plugin to the Nagios plugin location (for me it’s “/usr/local/nagios/libexec”) of the host that’s connected to the UPS. Again, don’t forget to mark it as an executable uisng : “chmod +x <restart_script_path>.sh”). Then, we can test it like below :

$ ./check_ups_capacity -w 60 -c 35

Then we must make an entry in nrpe.cfg file to register this plugin, It will be something like this :

command[check_ups_capacity]=/usr/bin/sudo /usr/local/nagios/libexec/check_ups_capacity -w 60 -c 35

Here we define the Nagios warning threshold to be below 60% capacity and critical threshold to be below 35% of the capacity.

NOTE: For the nrpe daemon to run this plugin as sudo user you might have to add it to the visudo (sudoers) file. Check this article for an example.

After restarting nrpe daemon ($ systemctl restart nrpe) we can go to the Nagios server and register this UPS check as a monitored service. It’s the same steps that I explained above for the NUT method. Read “Registering the UPS monitoring as service in the Nagios server” section above.

We can test the plugin from the Nagios server, run the “check_nrpe” (which is the plugin folder) and provide the parameters; IP Address of the host connected to the UPS by USB, and the command that we defined in nrpe.cfg file of that host (“check_ups_capacity” in this case)

We can have the UPS status monitored this way uisng Nagios.

Note: When you define the UPS services, instead of using “use generic-service” you could use a modified service template name in Nagios which would check the UPS every 1 minute instead of five minutes. Check Nagios doc on Templates for more info.

This is how the UPS monitoring looks on Nagios web and aNag app.

As always thanks for your interest and thanks for reading!