|
List Info
Thread: Nagios check_bioctl available
|
|
| Nagios check_bioctl available |

|
2006-07-29 00:07:14 |
I have written a perl script that parses the output from
bioctl and
returns it in a format that Nagios can use.
check_bioctl is avaliable here:
http://openbsd.somedomain.net/nagios/check_bioctl-1.
3.tar.gz
It is useful to me, and so I thought it might be useful to
someone else.
I wrote this on OpenBSD 3.9 and tested on Dell PERC 3/DC
controllers
using the ami driver. It should work just fine on other
versions of
OpenBSD as well as with other cards and drivers. If you do
run into
trouble, send me the output from bioctl on the system you
are having
trouble with and I can try to help. Patches to fix problems
would be
even better.
One thing I ran into is that bioctl needs to run as root to
get access
to /dev/bio, even for read only access. Is there a way to
query bioctl
without needing root?
Also, in biovar.h, both a raid volume and a disk can be
"Offline".
However, I am not sure what that means. Currently it is a
WARNING, but
I don't know what status it should be set to.
http://
www.openbsd.org/cgi-bin/cvsweb/src/sys/dev/biovar.h?rev=1.25
&content-type=text/x-cvsweb-markup
If anyone knows what the "Offline" status means,
I would sure like to
know.
An additional useful feature is that you can specify
multiple devices to
check in a single check
/usr/local/libexec/nagios/check_bioctl -d ami0 -d ami1
Output is similar to below, except with NAGIOS_OUTPUT set to
1 in the
source (as it usually is) all output is on a single line
separated with
<br> and it hides any devices that are OK because
Nagios has a limit on
the length of a response.
CRITICAL (1):
ami0 sd1 Degraded
WARNING (1):
ami0 0:8.0 Rebuild <QUANTUM ATLAS10K2-TY184JDA40>
OK (7):
ami0 sd0 Online
ami0 0:0.0 Online <IBM DMVS09M 0220>
ami0 0:1.0 Online <IBM DRVS09D 0140>
ami0 0:3.0 Online <QUANTUM ATLAS10K2-TY184JDA40>
ami0 0:4.0 Online <QUANTUM ATLAS10K2-TY184JDA40>
ami0 0:5.0 Online <QUANTUM ATLAS10K2-TY184JDA40>
ami0 0:2.0 Hot spare <IBM DRVS09D 0140>
I currently configure it something like this:
$ grep check_bioctl /etc/sudoers /etc/nrpe.cfg
/etc/sudoers:_nrpe ALL =
NOPASSWD:/usr/local/libexec/nagios/check_bioctl -d ami0
/etc/nrpe.cfg:command[check_bioctl]=/usr/bin/sudo
/usr/local/libexec/nagios/check_bioctl -d ami0
Also available is check_hw_sensors for checking of sysctl
hw.sensors
from Nagios.
http://openbsd.
somedomain.net/nagios/
l8rZ,
--
andrew - ICQ# 253198 - JID: afresh1 jabber.org
BOFH excuse of the day: YOU HAVE AN I/O ERROR ->
Incompetent Operator
error
|
|
| Nagios check_bioctl available |

|
2006-07-29 02:17:28 |
andrew fresh wrote:
> I have written a perl script that parses the output
from bioctl and
> returns it in a format that Nagios can use.
Sweet
>
> check_bioctl is avaliable here:
> http://openbsd.somedomain.net/nagios/check_bioctl-1.
3.tar.gz
>
> It is useful to me, and so I thought it might be useful
to someone else.
>
> I wrote this on OpenBSD 3.9 and tested on Dell PERC
3/DC controllers
> using the ami driver. It should work just fine on other
versions of
> OpenBSD as well as with other cards and drivers. If you
do run into
> trouble, send me the output from bioctl on the system
you are having
> trouble with and I can try to help. Patches to fix
problems would be
> even better.
>
>
> One thing I ran into is that bioctl needs to run as
root to get access
> to /dev/bio, even for read only access. Is there a way
to query bioctl
> without needing root?
No!
>
>
> Also, in biovar.h, both a raid volume and a disk can be
"Offline".
> However, I am not sure what that means. Currently it
is a WARNING, but
> I don't know what status it should be set to.
If 2 or more physical disks of a RAID 5 are offline a volume
will be
marked offline as well. An offline RAID 5 is obviously a
critical
event. Hope this makes sense since I am not exactly sure
what you are
asking.
>
> http://
www.openbsd.org/cgi-bin/cvsweb/src/sys/dev/biovar.h?rev=1.25
&content-type=text/x-cvsweb-markup
>
> If anyone knows what the "Offline" status
means, I would sure like to
> know.
>
>
> An additional useful feature is that you can specify
multiple devices to
> check in a single check
>
> /usr/local/libexec/nagios/check_bioctl -d ami0 -d ami1
>
>
> Output is similar to below, except with NAGIOS_OUTPUT
set to 1 in the
> source (as it usually is) all output is on a single
line separated with
> <br> and it hides any devices that are OK because
Nagios has a limit on
> the length of a response.
>
> CRITICAL (1):
> ami0 sd1 Degraded
> WARNING (1):
> ami0 0:8.0 Rebuild <QUANTUM
ATLAS10K2-TY184JDA40>
> OK (7):
> ami0 sd0 Online
> ami0 0:0.0 Online <IBM DMVS09M
0220>
> ami0 0:1.0 Online <IBM DRVS09D
0140>
> ami0 0:3.0 Online <QUANTUM
ATLAS10K2-TY184JDA40>
> ami0 0:4.0 Online <QUANTUM
ATLAS10K2-TY184JDA40>
> ami0 0:5.0 Online <QUANTUM
ATLAS10K2-TY184JDA40>
> ami0 0:2.0 Hot spare <IBM DRVS09D
0140>
>
>
> I currently configure it something like this:
>
> $ grep check_bioctl /etc/sudoers /etc/nrpe.cfg
> /etc/sudoers:_nrpe ALL =
NOPASSWD:/usr/local/libexec/nagios/check_bioctl -d ami0
> /etc/nrpe.cfg:command[check_bioctl]=/usr/bin/sudo
/usr/local/libexec/nagios/check_bioctl -d ami0
>
>
> Also available is check_hw_sensors for checking of
sysctl hw.sensors
> from Nagios.
>
> http://openbsd.
somedomain.net/nagios/
>
> l8rZ,
|
|
| Nagios check_bioctl available |

|
2006-07-29 02:24:12 |
andrew fresh wrote:
> I have written a perl script that parses the output
from bioctl and
> returns it in a format that Nagios can use.
Sweet
>
> check_bioctl is avaliable here:
> http://openbsd.somedomain.net/nagios/check_bioctl-1.
3.tar.gz
>
> It is useful to me, and so I thought it might be useful
to someone else.
>
> I wrote this on OpenBSD 3.9 and tested on Dell PERC
3/DC controllers
> using the ami driver. It should work just fine on other
versions of
> OpenBSD as well as with other cards and drivers. If you
do run into
> trouble, send me the output from bioctl on the system
you are having
> trouble with and I can try to help. Patches to fix
problems would be
> even better.
>
>
> One thing I ran into is that bioctl needs to run as
root to get access
> to /dev/bio, even for read only access. Is there a way
to query bioctl
> without needing root?
No!
>
>
> Also, in biovar.h, both a raid volume and a disk can be
"Offline".
> However, I am not sure what that means. Currently it
is a WARNING, but
> I don't know what status it should be set to.
If 2 or more physical disks of a RAID 5 are offline a volume
will be
marked offline as well. An offline RAID 5 is obviously a
critical
event. Hope this makes sense since I am not exactly sure
what you are
asking.
>
> http://
www.openbsd.org/cgi-bin/cvsweb/src/sys/dev/biovar.h?rev=1.25
&content-type=text/x-cvsweb-markup
>
> If anyone knows what the "Offline" status
means, I would sure like to
> know.
>
>
> An additional useful feature is that you can specify
multiple devices to
> check in a single check
>
> /usr/local/libexec/nagios/check_bioctl -d ami0 -d ami1
>
>
> Output is similar to below, except with NAGIOS_OUTPUT
set to 1 in the
> source (as it usually is) all output is on a single
line separated with
> <br> and it hides any devices that are OK because
Nagios has a limit on
> the length of a response.
>
> CRITICAL (1):
> ami0 sd1 Degraded
> WARNING (1):
> ami0 0:8.0 Rebuild <QUANTUM
ATLAS10K2-TY184JDA40>
> OK (7):
> ami0 sd0 Online
> ami0 0:0.0 Online <IBM DMVS09M
0220>
> ami0 0:1.0 Online <IBM DRVS09D
0140>
> ami0 0:3.0 Online <QUANTUM
ATLAS10K2-TY184JDA40>
> ami0 0:4.0 Online <QUANTUM
ATLAS10K2-TY184JDA40>
> ami0 0:5.0 Online <QUANTUM
ATLAS10K2-TY184JDA40>
> ami0 0:2.0 Hot spare <IBM DRVS09D
0140>
>
>
> I currently configure it something like this:
>
> $ grep check_bioctl /etc/sudoers /etc/nrpe.cfg
> /etc/sudoers:_nrpe ALL =
NOPASSWD:/usr/local/libexec/nagios/check_bioctl -d ami0
> /etc/nrpe.cfg:command[check_bioctl]=/usr/bin/sudo
/usr/local/libexec/nagios/check_bioctl -d ami0
>
>
> Also available is check_hw_sensors for checking of
sysctl hw.sensors
> from Nagios.
>
> http://openbsd.
somedomain.net/nagios/
>
> l8rZ,
|
|
| Nagios check_bioctl available |

|
2006-07-30 01:03:26 |
2006/7/29, andrew fresh <andrew mad-techies.org>:
> One thing I ran into is that bioctl needs to run as
root to get access
> to /dev/bio, even for read only access. Is there a way
to query bioctl
> without needing root?
Well, I think you only need the status of the drives and
that is
availlable using sysctl hw.sensors in current (you already
mentioned
sysctl). A monitoring system should not use the capabilities
of
bioctl, it just needs to know the status and report that.
Now that I think of it, I should add support to the upwatch
monitoring
system too, but I am not that lucky to have hardware to
actually test
it
Wijnand
|
|
| Nagios check_bioctl available |

|
2006-07-31 20:52:17 |
On Fri, Jul 28, 2006 at 09:17:28PM -0500, Marco Peereboom
wrote:
> andrew fresh wrote:
> >I have written a perl script that parses the output
from bioctl and
> >returns it in a format that Nagios can use.
>
> Sweet
Thanks!
> >One thing I ran into is that bioctl needs to run as
root to get access
> >to /dev/bio, even for read only access. Is there a
way to query bioctl
> >without needing root?
>
> No!
dang! oh well, sudo is a good enough solution then.
> >Also, in biovar.h, both a raid volume and a disk
can be "Offline".
> >However, I am not sure what that means. Currently
it is a WARNING, but
> >I don't know what status it should be set to.
>
> If 2 or more physical disks of a RAID 5 are offline a
volume will be
> marked offline as well. An offline RAID 5 is obviously
a critical
> event. Hope this makes sense since I am not exactly
sure what you are
> asking.
I will change Offline to be a CRITICAL error.
and here is the new version:
http://openbsd.somedomain.net/nagios/check_bioctl-1.
4.tar.gz
However, I guess my question is what would cause a disk to
be Offline?
There is a separate status for Failed, and I could see the
RAID being
Offline if too many disks had Failed.
Are there any other status that should be different? They
seemed to be
fairly straight forward, but there may be good arguments for
them to be
changed.
my %Status_Map = (
Online => 'OK',
Offline => 'CRITICAL',
Degraded => 'CRITICAL',
Failed => 'CRITICAL',
Building => 'WARNING',
Rebuild => 'WARNING',
'Hot spare' => 'OK',
Unused => 'OK',
Scrubbing => 'WARNING',
Invalid => 'CRITICAL',
);
l8rZ,
--
andrew - ICQ# 253198 - JID: afresh1 jabber.org
BOFH excuse of the day: Windows 95 undocumented
"feature"
|
|
| Nagios check_bioctl available |

|
2006-07-31 21:09:21 |
On Sun, Jul 30, 2006 at 03:03:26AM +0200, Wijnand Wiersma
wrote:
> 2006/7/29, andrew fresh <andrew mad-techies.org>:
> >One thing I ran into is that bioctl needs to run as
root to get access
> >to /dev/bio, even for read only access. Is there a
way to query bioctl
> >without needing root?
>
> Well, I think you only need the status of the drives
and that is
> availlable using sysctl hw.sensors in current (you
already mentioned
> sysctl). A monitoring system should not use the
capabilities of
> bioctl, it just needs to know the status and report
that.
If that is the case, then this check will become obsolete.
That would
be nice! I will have to go put -current on my test box and
try it out.
As it is, on my 3.9-stable box, the output from sysctl if it
is
available does not seem very reliable:
hw.sensors.29=esm0, Drive 0, drive, online
hw.sensors.30=esm0, Drive 1, drive, online
hw.sensors.31=esm0, Drive 2, drive, unknown
hw.sensors.32=esm0, Drive 3, drive, unknown
hw.sensors.33=esm0, Drive 4, drive, online
hw.sensors.34=esm0, Drive 5, drive, online
hw.sensors.35=esm0, Drive 6, drive, unknown
hw.sensors.36=esm0, Drive 7, drive, unknown
$ sudo bioctl ami0
Password:
Volume Status Size Device
ami0 0 Online 8984199168 sd0 RAID1
0 Online 8984199168 0:0.0 safte0 <IBM
DRVS09D 0140>
1 Online 8984199168 0:1.0 safte0 <IBM
DRVS09D 0140>
ami0 1 Online 36234592256 sd1 RAID10
0 Online 18117296128 0:3.0 safte0 <QUANTUM
ATLAS10K2-TY184JDA40>
1 Online 18117296128 0:4.0 safte0 <QUANTUM
ATLAS10K2-TY184JDA40>
2 Online 18117296128 0:5.0 safte0 <QUANTUM
ATLAS10K2-TY184JDA40>
3 Online 18117296128 0:8.0 safte0 <QUANTUM
ATLAS10K2-TY184JDA40>
ami0 2 Hot spare 8984199168 0:2.0 safte0 <IBM
DMVS09M 0220>
ami0 3 Hot spare 18117296128 0:9.0 safte0 <QUANTUM
ATLAS 10K 18SCA UCHD>
The rest of the sensors seem mostly correct though, and
there are sure
enough of them!
$ sysctl hw.sensors | tail -1
hw.sensors.99=safte0, temp1, OK, temp, 27.78 degC / 82.00
degF
Also, on another box that has external disk box connected
with ses, I
don't get any status for those disks in sysctl. The disks
that are
actually in the server are using safte and those show up in
sysctl. I
don't know why, so now I have this check
> Now that I think of it, I should add support to the
upwatch monitoring
> system too, but I am not that lucky to have hardware to
actually test
> it
If the information is available in sysctl in 4.0, that would
be the
check to integrate.
l8rZ,
--
andrew - ICQ# 253198 - JID: afresh1 jabber.org
BOFH excuse of the day: dynamic software linking table
corrupted
|
|
| Nagios check_bioctl available |

|
2006-07-31 21:51:07 |
dmesg please
On Mon, Jul 31, 2006 at 02:09:21PM -0700, andrew fresh
wrote:
> On Sun, Jul 30, 2006 at 03:03:26AM +0200, Wijnand
Wiersma wrote:
> > 2006/7/29, andrew fresh <andrew mad-techies.org>:
> > >One thing I ran into is that bioctl needs to
run as root to get access
> > >to /dev/bio, even for read only access. Is
there a way to query bioctl
> > >without needing root?
> >
> > Well, I think you only need the status of the
drives and that is
> > availlable using sysctl hw.sensors in current (you
already mentioned
> > sysctl). A monitoring system should not use the
capabilities of
> > bioctl, it just needs to know the status and
report that.
>
> If that is the case, then this check will become
obsolete. That would
> be nice! I will have to go put -current on my test box
and try it out.
>
>
> As it is, on my 3.9-stable box, the output from sysctl
if it is
> available does not seem very reliable:
>
> hw.sensors.29=esm0, Drive 0, drive, online
> hw.sensors.30=esm0, Drive 1, drive, online
> hw.sensors.31=esm0, Drive 2, drive, unknown
> hw.sensors.32=esm0, Drive 3, drive, unknown
> hw.sensors.33=esm0, Drive 4, drive, online
> hw.sensors.34=esm0, Drive 5, drive, online
> hw.sensors.35=esm0, Drive 6, drive, unknown
> hw.sensors.36=esm0, Drive 7, drive, unknown
>
> $ sudo bioctl ami0
> Password:
> Volume Status Size Device
> ami0 0 Online 8984199168 sd0 RAID1
> 0 Online 8984199168 0:0.0 safte0
<IBM DRVS09D 0140>
> 1 Online 8984199168 0:1.0 safte0
<IBM DRVS09D 0140>
> ami0 1 Online 36234592256 sd1 RAID10
> 0 Online 18117296128 0:3.0 safte0
<QUANTUM ATLAS10K2-TY184JDA40>
> 1 Online 18117296128 0:4.0 safte0
<QUANTUM ATLAS10K2-TY184JDA40>
> 2 Online 18117296128 0:5.0 safte0
<QUANTUM ATLAS10K2-TY184JDA40>
> 3 Online 18117296128 0:8.0 safte0
<QUANTUM ATLAS10K2-TY184JDA40>
> ami0 2 Hot spare 8984199168 0:2.0 safte0
<IBM DMVS09M 0220>
> ami0 3 Hot spare 18117296128 0:9.0 safte0
<QUANTUM ATLAS 10K 18SCA UCHD>
>
>
> The rest of the sensors seem mostly correct though, and
there are sure
> enough of them!
>
> $ sysctl hw.sensors | tail -1
> hw.sensors.99=safte0, temp1, OK, temp, 27.78 degC /
82.00 degF
>
>
> Also, on another box that has external disk box
connected with ses, I
> don't get any status for those disks in sysctl. The
disks that are
> actually in the server are using safte and those show
up in sysctl. I
> don't know why, so now I have this check
>
>
> > Now that I think of it, I should add support to
the upwatch monitoring
> > system too, but I am not that lucky to have
hardware to actually test
> > it
>
> If the information is available in sysctl in 4.0, that
would be the
> check to integrate.
>
> l8rZ,
> --
> andrew - ICQ# 253198 - JID: afresh1 jabber.org
>
> BOFH excuse of the day: dynamic software linking table
corrupted
|
|
[1-7]
|
|