Archive
Tags
android (3)
ant (2)
beautifulsoup (1)
debian (1)
decorators (1)
django (9)
dovecot (1)
encryption (1)
fix (4)
gotcha (2)
hobo (1)
htmlparser (1)
imaplib (2)
java (1)
json (2)
kerberos (2)
linux (7)
lxml (5)
markdown (4)
mechanize (6)
multiprocessing (1)
mysql (2)
nagios (2)
new_features (3)
open_source (5)
optparse (2)
parsing (1)
perl (2)
postgres (1)
preseed (1)
pxe (4)
pyqt4 (1)
python (41)
raid (1)
rails (1)
red_hat (1)
reportlab (4)
request_tracker (2)
rt (2)
ruby (1)
scala (1)
screen_scraping (7)
shell_scripting (8)
soap (1)
solaris (3)
sql (2)
sqlalchemy (2)
tips_and_tricks (1)
twitter (2)
ubuntu (1)
vmware (2)
windows (1)
zimbra (2)

Our Solarwinds Network Performance Monitor has a problem rendering custom reports on occasion. For something like that, there isn't an existing plugin for Nagios. Writing these plugins is easy. All there is to it is exit statuses. After reading this, you should have an idea of how to write a Nagios plugin for a variety of web applications.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
#!/usr/bin/env python

from mechanize import Browser
from optparse import OptionParser

# Exit statuses recognized by Nagios
UNKNOWN = -1
OK = 0
WARNING = 1
CRITICAL = 2

def open_url(br, url):
    """Use a given mechanize.Browser to open url.

    If an exception is raised, then exit with CRITICAL status for Nagios.
    """
    try:
        response = br.open(url)
    except Exception, e:
        # Catching all exceptions is usually a bad idea.  We want to catch
        # them all to report to Nagios here.
        print 'CRITICAL - Could not reach page at %s: %s' % (url, e)
        raise SystemExit, CRITICAL
    return response

# I'm going to be using optparse.OptionParser from now on.  It makes
# command-line args a breeze.
parser = OptionParser()
parser.add_option('-H', '--hostname', dest='hostname')
parser.add_option('-u', '--username', dest='username')
parser.add_option('-p', '--password', dest='password')
parser.add_option('-r', '--report_url', dest='url',
    help="""Path to report relative to root, like
    /NetPerfMon/Report.asp?Report=Hostname+__+IPs""")
parser.add_option('-v', '--verbose', dest='verbose', action='store_true',
    default=False)
parser.add_option('-q', '--quiet', dest='verbose', action='store_false')

options, args = parser.parse_args()

# Check for required options
for option in ('hostname', 'username', 'password', 'url'):
    if not getattr(options, option):
        print 'CRITICAL - %s not specified' % option.capitalize()
        raise SystemExit, CRITICAL

# Go to the report and get a login page
br = Browser()
report_url = 'https://%s%s' % (options.hostname, options.url)
open_url(br, report_url)
br.select_form('aspnetForm')

# Solarwinds has interesting field names
# Maybe something with asp.net
br['ctl00$ContentPlaceHolder1$Username'] = options.username
br['ctl00$ContentPlaceHolder1$Password'] = options.password

# Attempt to login.  If we can't, tell Nagios.
try:
    report = br.submit()
except Exception, e:
    print 'CRITICAL - Error logging in: e' % e
    raise SystemExit, CRITICAL

report_html = report.read()
# class=Property occurs in every cell in a Solarwinds report.  If it's not
# there, something is wrong.
if 'class=Property' not in report_html:
    print 'CRITICAL - Report at %s is down' % report_url
    raise SystemExit, CRITICAL

# If we got this far, let's tell Nagios the report is okay.
print 'OK - Report at %s is up' % report_url
raise SystemExit, OK

To use our plugin, we need to do a bit of Nagios configuration. First, we need to define a command.

define command{
    command_name    check_npm_reports
    command_line    /usr/local/bin/reportmonitor.py -H $HOSTADDRESS$ $ARG1$
}

After that, we define a service.

define service{
    use         generic-service
    host_name           solarwinds-server
    service_description Solarwinds reports
    check_command       check_npm_reports!-u nagios -p some_password -r '/NetPerfMon/Report.asp?Report=Hostname+__+IPs'
}
Posted by Tyler Lesmann on September 3, 2009 at 13:37 and commented on 2 times
Tagged as: mechanize nagios optparse python screen_scraping

Nagios is a systems monitor that uses a variety of clients, pings, and port scans to check up on systems. Configuring Nagios can seem like daunting task at first glance because Nagios is so extensive. It's not hard as I'll show you.

I'm assuming that you've already downloaded, compiled, and installed Nagios and Nagios Plugins. If you haven't, consult this guide.

After installation, the first file you'll want to edit is main nagios.cfg, /usr/local/nagios/etc/nagios.cfg. You will add these three lines:

cfg_file=/usr/local/nagios/etc/hosts.cfg
cfg_file=/usr/local/nagios/etc/hostgroups.cfg
cfg_file=/usr/local/nagios/etc/services.cfg

If you have a large amount of hosts, you may want to look into using the cfg_dir directive, with which you can specify a directory in which all files should be parsed as configuration files.

We'll now want to create /usr/local/nagios/etc/hostgroups.cfg. Hostgroups are logical groupings of hosts. They affect how hosts are displayed on the monitor's views and can also be used to attach monitoring for services common to the group. Let's create a few.

define hostgroup{
    hostgroup_name    linux-servers
    alias    Linux Servers
}

define hostgroup{
    hostgroup_name    windows-servers
    alias    Windows Servers
}

define hostgroup{
    hostgroup_name    web-servers
    alias    Web Servers
}

The hostgroup_name is what will be used to reference the hostgroup in the configuration. The alias is what appears on the web interface.

We have some hostsgroups define so we can attach templates to them. Edit /usr/local/nagios/etc/objects/templates.cfg. This file holds template definitions that we will use shortly to define hosts. Find linux-server in this file. It will looks like this:

define host{
    name    linux-server ; The name of this host template
    use    generic-host ; This template inherits other values from the generic-host template
    check_period    24x7 ; By default, Linux hosts are checked round the clock
    check_interval    5 ; Actively check the host every 5 minutes
    retry_interval    1 ; Schedule host check retries at 1 minute intervals
    max_check_attempts    10  ; Check each Linux host 10 times (max)
    check_command    check-host-alive ; Default command to check Linux hosts
    notification_interval    120 ; Resend notifications every 2 hours
    notification_options    d,u,r ; Only send notifications for specific host states
    contact_groups    admins ; Notifications get sent to the admins by default
    register    0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
    }

Add this line within the curly braces:

hostgroups linux-servers

Every host we make using the linux-server template, will be a member of the linux-servers hostgroup. Find the windows-servers host and add a hostgroup windows-servers line to it as well.

We're ready to define hosts now.

Open /usr/local/nagios/etc/hosts.cfg and add these few lines:

define host{
    use linux-server
    host_name www.tylerlesmann.com
    hostgroups web-servers
}

define host{
    use windows-server
    host_name nonexistentwindowsbox.tylerlesmann.com
    address 192.168.0.125
}

With use, we tell Nagios to implement a specific template when creating a host. The host_name is the host_name of the system. Nagios will do DNS lookups if no address is defined. You can define addition hostgroups the machine should belong to here or you can do it in the hostgroups.cfg with the members directive. You'll probably want to define your own hosts here instead of using my example hosts.

We almost have something useful. The last step is defining services and attaching them to hosts and hostgroups.

Create the /usr/local/nagios/etc/services.cfg and add these lines:

define service{
    use generic-service
    hostgroup_name linux-servers
    service_description SSH
    check_command check_ssh
}

define service{
    use generic-service
    hostgroup_name web-servers
    service_description HTTP
    check_command check_http
}

define service{
    use generic-service
    hostgroup_name windows-servers
    service_description RDP
    check_command check_tcp!3389
}

The hostgroup linux-servers is attached to the service SSH and Nagios will use check_ssh to monitor the service. This service will use the generic-service template, which defines items like timeouts. You may not be running ssh on the default port. You can tell check_ssh to use another port by giving it a -p argument, like so check_command check_ssh!-p 12345. Commands are defined in /usr/local/nagios/etc/objects/commands.cfg and the documentation for the plugins used in these commands is documented in man pages and on the Nagios Plugins site. You should be able to understand the rest of the service, as they don't vary much from SSH.

You have something functional. Before you go reloading the Nagios service, use this command to check your configuration syntax:

/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

The hardest part of Nagios is that it can be time consuming to define all the hosts and host-specific services to monitor. Using templates and hostgroups will save you hours.

Posted by Tyler Lesmann on November 24, 2008 at 7:57
Tagged as: linux nagios