December 2009
November 2009
October 2009
September 2009
June 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
July 2008
June 2008
October 2007
September 2007
Our Solarwinds Network Performance Monitor has a problem rendering custom reports on occasion. For something like that, there isn't an existing plugin for Nagios. Writing these plugins is easy. All there is to it is exit statuses. After reading this, you should have an idea of how to write a Nagios plugin for a variety of web applications.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 | #!/usr/bin/env python from mechanize import Browser from optparse import OptionParser # Exit statuses recognized by Nagios UNKNOWN = -1 OK = 0 WARNING = 1 CRITICAL = 2 def open_url(br, url): """Use a given mechanize.Browser to open url. If an exception is raised, then exit with CRITICAL status for Nagios. """ try: response = br.open(url) except Exception, e: # Catching all exceptions is usually a bad idea. We want to catch # them all to report to Nagios here. print 'CRITICAL - Could not reach page at %s: %s' % (url, e) raise SystemExit, CRITICAL return response # I'm going to be using optparse.OptionParser from now on. It makes # command-line args a breeze. parser = OptionParser() parser.add_option('-H', '--hostname', dest='hostname') parser.add_option('-u', '--username', dest='username') parser.add_option('-p', '--password', dest='password') parser.add_option('-r', '--report_url', dest='url', help="""Path to report relative to root, like /NetPerfMon/Report.asp?Report=Hostname+__+IPs""") parser.add_option('-v', '--verbose', dest='verbose', action='store_true', default=False) parser.add_option('-q', '--quiet', dest='verbose', action='store_false') options, args = parser.parse_args() # Check for required options for option in ('hostname', 'username', 'password', 'url'): if not getattr(options, option): print 'CRITICAL - %s not specified' % option.capitalize() raise SystemExit, CRITICAL # Go to the report and get a login page br = Browser() report_url = 'https://%s%s' % (options.hostname, options.url) open_url(br, report_url) br.select_form('aspnetForm') # Solarwinds has interesting field names # Maybe something with asp.net br['ctl00$ContentPlaceHolder1$Username'] = options.username br['ctl00$ContentPlaceHolder1$Password'] = options.password # Attempt to login. If we can't, tell Nagios. try: report = br.submit() except Exception, e: print 'CRITICAL - Error logging in: e' % e raise SystemExit, CRITICAL report_html = report.read() # class=Property occurs in every cell in a Solarwinds report. If it's not # there, something is wrong. if 'class=Property' not in report_html: print 'CRITICAL - Report at %s is down' % report_url raise SystemExit, CRITICAL # If we got this far, let's tell Nagios the report is okay. print 'OK - Report at %s is up' % report_url raise SystemExit, OK |
To use our plugin, we need to do a bit of Nagios configuration. First, we need to define a command.
define command{
command_name check_npm_reports
command_line /usr/local/bin/reportmonitor.py -H $HOSTADDRESS$ $ARG1$
}
After that, we define a service.
define service{
use generic-service
host_name solarwinds-server
service_description Solarwinds reports
check_command check_npm_reports!-u nagios -p some_password -r '/NetPerfMon/Report.asp?Report=Hostname+__+IPs'
}
Nagios is a systems monitor that uses a variety of clients, pings, and port scans to check up on systems. Configuring Nagios can seem like daunting task at first glance because Nagios is so extensive. It's not hard as I'll show you.
I'm assuming that you've already downloaded, compiled, and installed Nagios and Nagios Plugins. If you haven't, consult this guide.
After installation, the first file you'll want to edit is main nagios.cfg, /usr/local/nagios/etc/nagios.cfg. You will add these three lines:
cfg_file=/usr/local/nagios/etc/hosts.cfg cfg_file=/usr/local/nagios/etc/hostgroups.cfg cfg_file=/usr/local/nagios/etc/services.cfg
If you have a large amount of hosts, you may want to look into using the cfg_dir directive, with which you can specify a directory in which all files should be parsed as configuration files.
We'll now want to create /usr/local/nagios/etc/hostgroups.cfg. Hostgroups are logical groupings of hosts. They affect how hosts are displayed on the monitor's views and can also be used to attach monitoring for services common to the group. Let's create a few.
define hostgroup{
hostgroup_name linux-servers
alias Linux Servers
}
define hostgroup{
hostgroup_name windows-servers
alias Windows Servers
}
define hostgroup{
hostgroup_name web-servers
alias Web Servers
}
The hostgroup_name is what will be used to reference the hostgroup in the configuration. The alias is what appears on the web interface.
We have some hostsgroups define so we can attach templates to them. Edit /usr/local/nagios/etc/objects/templates.cfg. This file holds template definitions that we will use shortly to define hosts. Find linux-server in this file. It will looks like this:
define host{
name linux-server ; The name of this host template
use generic-host ; This template inherits other values from the generic-host template
check_period 24x7 ; By default, Linux hosts are checked round the clock
check_interval 5 ; Actively check the host every 5 minutes
retry_interval 1 ; Schedule host check retries at 1 minute intervals
max_check_attempts 10 ; Check each Linux host 10 times (max)
check_command check-host-alive ; Default command to check Linux hosts
notification_interval 120 ; Resend notifications every 2 hours
notification_options d,u,r ; Only send notifications for specific host states
contact_groups admins ; Notifications get sent to the admins by default
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
}
Add this line within the curly braces:
hostgroups linux-servers
Every host we make using the linux-server template, will be a member of the linux-servers hostgroup. Find the windows-servers host and add a hostgroup windows-servers line to it as well.
We're ready to define hosts now.
Open /usr/local/nagios/etc/hosts.cfg and add these few lines:
define host{
use linux-server
host_name www.tylerlesmann.com
hostgroups web-servers
}
define host{
use windows-server
host_name nonexistentwindowsbox.tylerlesmann.com
address 192.168.0.125
}
With use, we tell Nagios to implement a specific template when creating a host. The host_name is the host_name of the system. Nagios will do DNS lookups if no address is defined. You can define addition hostgroups the machine should belong to here or you can do it in the hostgroups.cfg with the members directive. You'll probably want to define your own hosts here instead of using my example hosts.
We almost have something useful. The last step is defining services and attaching them to hosts and hostgroups.
Create the /usr/local/nagios/etc/services.cfg and add these lines:
define service{
use generic-service
hostgroup_name linux-servers
service_description SSH
check_command check_ssh
}
define service{
use generic-service
hostgroup_name web-servers
service_description HTTP
check_command check_http
}
define service{
use generic-service
hostgroup_name windows-servers
service_description RDP
check_command check_tcp!3389
}
The hostgroup linux-servers is attached to the service SSH and Nagios will use check_ssh to monitor the service. This service will use the generic-service template, which defines items like timeouts. You may not be running ssh on the default port. You can tell check_ssh to use another port by giving it a -p argument, like so check_command check_ssh!-p 12345. Commands are defined in /usr/local/nagios/etc/objects/commands.cfg and the documentation for the plugins used in these commands is documented in man pages and on the Nagios Plugins site. You should be able to understand the rest of the service, as they don't vary much from SSH.
You have something functional. Before you go reloading the Nagios service, use this command to check your configuration syntax:
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
The hardest part of Nagios is that it can be time consuming to define all the hosts and host-specific services to monitor. Using templates and hostgroups will save you hours.
