Error: Could not open command file ‘/var/log/nagios/rw/nagios.cmd’ for update! on RedHat

I’m currently setting up some new clusters under RedHat and each cluster is getting it’s own Nagios instance, trying to use the web based management interface however threw and error.

I’ve seen this error before on Ubuntu and was getting it again under RedHat. Of course I revisited my Ubuntu solution and realised that it didn’t help at all, due to using dpkg overrides, also the situation was very different!


root@nagios:/var/log/nagios/rw# ls -al
total 8
drwxrwxr-x 2 nagios apache 4096 Oct 30 13:37 .
drwxrwxr-x 5 nagios apache 4096 Oct 30 13:40 ..

The file didn’t exist 🙁

A quick scan of a working system showed :


prw-rw---- 1 nagios nagcmd 0 2009-10-15 13:32 nagios.cmd

It’s a pipe! Woohoo.

So we need to create it


root@nagios:/var/log/nagios/rw# mknod nagios.cmd p
root@nagios:/var/log/nagios/rw# chown nagios:apache nagios.cmd
root@nagios:/var/log/nagios/rw# chmod 660 nagios.cmd
root@nagios:/var/log/nagios/rw# ls -la
total 8
drwxrwxr-x 2 nagios apache 4096 Oct 30 13:37 .
drwxrwxr-x 5 nagios apache 4096 Oct 30 13:43 ..
prw-rw---- 1 nagios apache 0 Oct 30 13:37 nagios.cmd
root@nagios:/var/log/nagios/rw# /etc/init.d/nagios restart

A quick check of the site and .. same error? Still broken? Noooooooooooo

Lets have another look ..


root@mm2su0:/var/log/nagios/rw# ls -al
total 8
drwxrwxr-x 2 nagios apache 4096 Oct 30 13:45 .
drwxrwxr-x 5 nagios apache 4096 Oct 30 13:45 ..
prw-rw---- 1 nagios nagios 0 Oct 30 13:45 nagios.cmd

Nice, nagios changed the permissions for us so apache can’t write to it. I’m not setting apache to run as the nagios user 🙁

A look in the init file for nagios shows that it actually manages the file itself, so we didn’t need to actually make it (strange that it wasn’t there when I looked then ..) The init file actually handles creation and removal of the file :


root@nagios:/var/log/nagios/rw# /etc/init.d/nagios stop
Stopping nagios: done.
root@nagios:/var/log/nagios/rw# ls
root@nagios:/var/log/nagios/rw# /etc/init.d/nagios start
Starting nagios: done.
root@nagios:/var/log/nagios/rw# ls
nagios.cmd
root@nagios:/var/log/nagios/rw# ls
nagios.cmd
root@nagios:/var/log/nagios/rw# ls -al
total 8
drwxrwxr-x 2 nagios apache 4096 Oct 30 13:52 .
drwxrwxr-x 5 nagios apache 4096 Oct 30 13:52 ..
prw-rw---- 1 nagios nagios 0 Oct 30 13:52 nagios.cmd

To fix this I’m going to stick the apache user in the nagios group.

[/code]
root@nagios:/var/log/nagios/rw# usermod -G nagios apache
root@nagios:/var/log/nagios/rw# /etc/init.d/httpd restart
[/code]

No more error, problem solved!

Nagios

If you want to become a more advanced Nagios administrator, I recommend Nagios by O’Reilly. It’s full of best practice implementation advice.

Error: Could not stat() command file ‘/var/lib/nagios3/rw/nagios.cmd’!

I’ve been doing a lot of Nagios deployments recently, and this error always bites me, on all Ubuntu versions, including Hardy and Intrepid (haven’t quite bit the bullet to try the Jaunty beta yet 🙂 )


Error: Could not stat() command file '/var/lib/nagios3/rw/nagios.cmd'!

The external command file may be missing, Nagios may not be running, and/or Nagios may not be checking external commands.

An error occurred while attempting to commit your command for processing.

This can be quite easily fixed with the following command line fu:


sudo /etc/init.d/nagios3 stop
sudo dpkg-statoverride --update --add nagios www-data 2710 /var/lib/nagios3/rw
sudo dpkg-statoverride --update --add nagios nagios 751 /var/lib/nagios3
sudo /etc/init.d/nagios3 start

Now you should be able to send Nagios remote commands and commands via the web interface to your heart’s content!

Nagios

For more Nagios advice, I recommend Nagios by O’Reilly. It’s full of best practice advice and covers solving more ‘gotchas’ that you might encounter whilst using it!