Archive
Tags
android (3)
ant (2)
beautifulsoup (1)
debian (1)
decorators (1)
django (9)
dovecot (1)
encryption (1)
fix (4)
gotcha (2)
hobo (1)
htmlparser (1)
imaplib (2)
java (1)
json (2)
kerberos (2)
linux (7)
lxml (5)
markdown (4)
mechanize (6)
multiprocessing (1)
mysql (2)
nagios (2)
new_features (3)
open_source (5)
optparse (2)
parsing (1)
perl (2)
postgres (1)
preseed (1)
pxe (4)
pyqt4 (1)
python (41)
raid (1)
rails (1)
red_hat (1)
reportlab (4)
request_tracker (2)
rt (2)
ruby (1)
scala (1)
screen_scraping (7)
shell_scripting (8)
soap (1)
solaris (3)
sql (2)
sqlalchemy (2)
tips_and_tricks (1)
twitter (2)
ubuntu (1)
vmware (2)
windows (1)
zimbra (2)

Our Solarwinds Network Performance Monitor has a problem rendering custom reports on occasion. For something like that, there isn't an existing plugin for Nagios. Writing these plugins is easy. All there is to it is exit statuses. After reading this, you should have an idea of how to write a Nagios plugin for a variety of web applications.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
#!/usr/bin/env python

from mechanize import Browser
from optparse import OptionParser

# Exit statuses recognized by Nagios
UNKNOWN = -1
OK = 0
WARNING = 1
CRITICAL = 2

def open_url(br, url):
    """Use a given mechanize.Browser to open url.

    If an exception is raised, then exit with CRITICAL status for Nagios.
    """
    try:
        response = br.open(url)
    except Exception, e:
        # Catching all exceptions is usually a bad idea.  We want to catch
        # them all to report to Nagios here.
        print 'CRITICAL - Could not reach page at %s: %s' % (url, e)
        raise SystemExit, CRITICAL
    return response

# I'm going to be using optparse.OptionParser from now on.  It makes
# command-line args a breeze.
parser = OptionParser()
parser.add_option('-H', '--hostname', dest='hostname')
parser.add_option('-u', '--username', dest='username')
parser.add_option('-p', '--password', dest='password')
parser.add_option('-r', '--report_url', dest='url',
    help="""Path to report relative to root, like
    /NetPerfMon/Report.asp?Report=Hostname+__+IPs""")
parser.add_option('-v', '--verbose', dest='verbose', action='store_true',
    default=False)
parser.add_option('-q', '--quiet', dest='verbose', action='store_false')

options, args = parser.parse_args()

# Check for required options
for option in ('hostname', 'username', 'password', 'url'):
    if not getattr(options, option):
        print 'CRITICAL - %s not specified' % option.capitalize()
        raise SystemExit, CRITICAL

# Go to the report and get a login page
br = Browser()
report_url = 'https://%s%s' % (options.hostname, options.url)
open_url(br, report_url)
br.select_form('aspnetForm')

# Solarwinds has interesting field names
# Maybe something with asp.net
br['ctl00$ContentPlaceHolder1$Username'] = options.username
br['ctl00$ContentPlaceHolder1$Password'] = options.password

# Attempt to login.  If we can't, tell Nagios.
try:
    report = br.submit()
except Exception, e:
    print 'CRITICAL - Error logging in: e' % e
    raise SystemExit, CRITICAL

report_html = report.read()
# class=Property occurs in every cell in a Solarwinds report.  If it's not
# there, something is wrong.
if 'class=Property' not in report_html:
    print 'CRITICAL - Report at %s is down' % report_url
    raise SystemExit, CRITICAL

# If we got this far, let's tell Nagios the report is okay.
print 'OK - Report at %s is up' % report_url
raise SystemExit, OK

To use our plugin, we need to do a bit of Nagios configuration. First, we need to define a command.

define command{
    command_name    check_npm_reports
    command_line    /usr/local/bin/reportmonitor.py -H $HOSTADDRESS$ $ARG1$
}

After that, we define a service.

define service{
    use         generic-service
    host_name           solarwinds-server
    service_description Solarwinds reports
    check_command       check_npm_reports!-u nagios -p some_password -r '/NetPerfMon/Report.asp?Report=Hostname+__+IPs'
}
Posted by Tyler Lesmann on September 3, 2009 at 13:37 and commented on 2 times
Tagged as: mechanize nagios optparse python screen_scraping

Doing anything with SOAP is a pain without a WSDL, which is the case with Zimbra. All of the Howtos I found about SOAP and ruby either required a WSDL or making several classes in a special, undocumented way to trick a SOAP::RPC::Driver instance into working. Both were unacceptable. After much hardship, I found an easier to read way to do SOAP without an WSDL in ruby, by building SOAP::Elements myself. Here is the code, documented to be easy to read, use, and extend.

# Incomplete library for interacting with Zimbra
#
#  require 'zimbra'
#
#  host = 'zimbra.tylerlesmann.com'
#  user = 'root'
#  passwd = 'hard_password'
#  creds = Zimbra.authenticate(host, user, passwd)
#  usercreds = Zimbra.masquerade(host, creds.authToken, 'tlesmann')
#  Zimbra.createappointment(host, usercreds.authToken,
#    Time.local(2009, 6, 26), 'Make a blog post', 'Maybe some Java', [
#    '/home/tlesmann/Documents/java.png',
#    '/home/tlesmann/Documents/tutorial.pdf',
#  ])

require 'net/http'
require 'net/https'
require 'soap/element'
require 'soap/rpc/driver'
require 'soap/processor'
require 'soap/streamHandler'
require 'soap/property'
require 'zimbra/multipart'

module Zimbra
  # Builds and sends AuthRequest to a provided Zimbra host.
  #
  # Returns a SOAP::Mapping instance, with an authToken attribute
  def self.authenticate(host, name, password)
    header = SOAP::SOAPHeader.new
    body = SOAP::SOAPBody.new(element('AuthRequest', nil,
      {
        'xmlns' => 'urn:zimbraAdmin',
      },
      [
        element('name', name),
        element('password', password),
      ]
    ))
    envelope = SOAP::SOAPEnvelope.new(header, body)
    return send_soap(envelope, host)
  end

  # Builds and sends CreateAppointmentRequest to a provided Zimbra host.  The
  # attachments argument expects a list of filename strings.
  #
  # Returns a SOAP::Mapping instance
  def self.createappointment(host, authToken, start, subject, description='',
    attachments=[])
    header = SOAP::SOAPHeader.new
    context = element('context', nil, {'xmlns' => 'urn:zimbra'}, [
      element('authToken', authToken)
    ])
    header.add('context', context)
    aids = []
    for attachment in attachments
      aids << upload_attachment(host, authToken, attachment)
    end
    if aids.empty?
      attach = nil
    else
      attach = element('attach', nil, {
      'aid' => aids.join(",")
      })
    end
    body = SOAP::SOAPBody.new(element('CreateAppointmentRequest', nil,
      {
        'xmlns' => 'urn:zimbraMail'
      },
      [
        element('m', nil, {}, [
          element('inv', nil, {}, [
            element('comp', nil,
              {
                'status' => 'CONF',
                'allDay' => 1,
                'fb' => 'F',
                'name' => subject,
                'noBlob' => 1,
              },
              [
                datetime('s', start),
                datetime('e', start),
                element('descHtml', description),
                element('alarm', nil,
                  {
                    'action' => 'DISPLAY'
                  },
                  [
                    element('trigger', nil, {}, [
                      element('rel', nil, {
                        'm' => 1
                      })
                  ]),
                  element('desc', subject),
                  ]
                ),
              ]
            ),
          ]),
          attach
        ]),
      ]
    ))
    envelope = SOAP::SOAPEnvelope.new(header, body)
    send_soap(envelope, host)
  end

  # builds SOAP::SOAPElement with tag name with a *d* attribute of the
  # provided ruby Time
  def self.datetime(name, time)
    return element(name, nil, {'d' => time.strftime("%Y%m%d")})
  end

  # builds SOAP::SOAPElements the way SOAP::SOAPElement constructor _should_
  #
  #  element('AuthRequest', nil,
  #    {
  #      'xmlns' => 'urn:zimbraAdmin',
  #    },
  #    [
  #    element('name', 'whoa'),
  #    element('password', 'man'),
  #    ]
  #  )
  #
  # The returned SOAP::SOAPElement converted to XML would be:
  #
  #  <AuthRequest xmlns="urn:zimbraAdmin">
  #    <name>whoa</name>
  #    <password>man</password>
  #  </AuthRequest>
  def self.element(name, value=nil, attrs={}, children=[])
    element = SOAP::SOAPElement.new(name, value)
    element.extraattr.update(attrs)
    for child in children
      if child
        element.add(child)
      end
    end
    return element
  end

  # Builds and sends DelegateAuth Request to a provided Zimbra host.  The
  # authToken must be that of an admin!  The account arg is nothing fancy, just
  # the username of the user to spoof.
  #
  # Returns a SOAP::Mapping instance, with an authToken attribute
  def self.masquerade(host, authToken, account)
    header = SOAP::SOAPHeader.new
    context = element('context', nil, {'xmlns' => 'urn:zimbra'}, [
      element('authToken', authToken)
    ])
    header.add('context', context)
    body = SOAP::SOAPBody.new(element('DelegateAuthRequest', nil,
      {
          'xmlns' => 'urn:zimbraAdmin'
      },
      [
        element('account', account, {
          'by' => 'name',
        })
      ]
    ))
    envelope = SOAP::SOAPEnvelope.new(header, body)
    return send_soap(envelope, host)
  end

  # Marshals SOAP::Envelopes and sends them to a given Zimbra host
  #
  # Returns response as a SOAP::Mapping instance
  def self.send_soap(envelope, host)
    url = 'https://' + host + ':7071/service/admin/soap/'
    stream = SOAP::HTTPStreamHandler.new(SOAP::Property.new)
    request_string = SOAP::Processor.marshal(envelope)
    puts request_string if $DEBUG
    request = SOAP::StreamHandler::ConnectionData.new(request_string)
    response_string = stream.send(url, request).receive_string
    puts response_string if $DEBUG
    env = SOAP::Processor.unmarshal(response_string)
    return SOAP::Mapping.soap2obj(env.body.root_node)
  end

  # Uploads file to given Zimbra host
  #
  # Returns a string containing the Zimbra attachment id.  These attachments are
  # only accessible to the user that uploaded them.
  def self.upload_attachment(host, authToken, filename)
    params = Hash.new
    file = File.open(filename, "rb")
    params["attachment"] = file
    mp = Multipart::MultipartPost.new
    query, headers = mp.prepare_query(params)
    file.close
    headers['Cookie'] = 'ZM_AUTH_TOKEN=' + authToken
    url = URI.parse('https://' + host + '/service/upload')
    client = Net::HTTP.new(url.host, url.port)
    client.use_ssl = true
    response = client.post(url.path + '?fmt=raw', query, headers)
    return response.body.split(',')[2].strip.slice(1..-2)
  end
end

Note: I would have done this in python, if it were not needed for an existing rails application. ;)

Posted by Tyler Lesmann on June 24, 2009 at 16:14 and commented on 9 times
Tagged as: mechanize ruby soap zimbra

I found a blog post today that gleans the names and messages from the Twitter search. As an exercise, I decided to rewrite this using mechanize and lxml. My code writes to the standard out instead of a file. The user could redirect the output for the same effect. Note: I am aware that Twitter has JSON, plus several apis, and using that would be easier than this. This is an exercise.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
#!/usr/bin/env python
import getopt                     
import sys                        
from mechanize import Browser, _mechanize
from lxml.html import parse              

baseurl = "http://search.twitter.com/search?lang=en&q="

def search_twitter(terms, pages=1):                    
    """                                                
    terms = a list of search terms                     
    pages(optional) = number of pages to retrive       

    returns a list of dictionaries
    """
    br = Browser()
    br.set_handle_robots(False)
    results = []
    response = br.open("".join([baseurl, "+".join(terms)]))
    while(pages > 0):
        doc = parse(response).getroot()
        for msg in doc.cssselect('div.msg'):
            name = msg.cssselect('a')[0].text_content()
            text = msg.cssselect('span')[0].text_content()
            text = text.replace(' (expand)', '')
            results.append({
                'name': name,
                'text': text,
            })
        try:
            response = br.follow_link(text='Older')
        except _mechanize.LinkNotFoundError:
            break # No more pages :(
        pages -= 1
    return results

if __name__ == '__main__':
    optlist, args = getopt.getopt(sys.argv[1:], 'p:', ['pages='])
    optd = dict(optlist)
    pages = 1
    if '-p' in optd:
        pages = int(optd['-p'])
    if '--pages' in optd:
        pages = int(optd['--pages'])
    if len(args) < 1:
        print """
        Usage: %s [-p] [--pages] search terms
            -p, --pages = number of pages to retrieve
        """ % sys.argv[0]
        raise SystemExit, 1
    results = search_twitter(args, pages)
    for result in results:
        print "%(name)-20s%(text)s" % result
Posted by Tyler Lesmann on January 14, 2009 at 15:16
Tagged as: lxml mechanize python screen_scraping

In my post from a while back, I gave an example of the standard HTMLParser's use. HTMLParser is not the easiest way to glean information from HTML. There are two modules that are not part of the standard python distribution that can shorten the development time. The first is BeautifulSoup. Here is the code from the previous episode using BeautifulSoup instead of HTMLParser.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
#!/usr/bin/env python

from BeautifulSoup import BeautifulSoup
from mechanize import Browser

br = Browser()
response = br.open('http://tylerlesmann.com/')
soup = BeautifulSoup(response.read())
headers = soup.findAll('div', attrs={'class': 'header'})
headlines = []
for header in headers:
    links = header.findAll('a')
    for link in links:
        headlines.append(link.string)
for headline in headlines:
    print headline

This is a lot shorter, 16 instead of 38 lines. It also took about 20 seconds to write. There is one gotcha here. Both scripts do the same task. This one using BeautifulSoup takes over twice as long to run. CPU time is much cheaper than development time though.

The next module is lxml. Here's the lxml version of the code.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
#!/usr/bin/env python

from lxml.html import parse
from mechanize import Browser

br = Browser()
response = br.open('http://tylerlesmann.com/')
doc = parse(response).getroot()
for link in doc.cssselect('div.header a'):
    print link.text_content()

As you can see, it is even shorter than BeautifulSoup at 10 lines. On top of that, lxml is faster than HTMLParser. So what is the catch? The lxml module uses C code, so you will not be able to use it on Google's AppEngine or on Jython.

Posted by Tyler Lesmann on January 14, 2009 at 6:13
Tagged as: beautifulsoup lxml mechanize python screen_scraping

In the last post, I illustrated how to most efficiently fetch html for data mining using the mechanize module. Now that we have our html, we can parse it for the information we want. To do this, we will use the HTMLParser module. This is a standard module in Python, so you don't have to install anything.

In this example, we will glean all of the headlines from the main page of this blog.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#!/usr/bin/env python

from HTMLParser import HTMLParser
from mechanize import Browser

class HeadLineParser(HTMLParser):
    def __init__(self):
        self.in_header = False
        self.in_headline = False
        self.headlines = []
        HTMLParser.__init__(self)

    def handle_starttag(self, tag, attrs):
        if tag == 'div':
            # attrs is a list of tuple pairs, a dictionary is more useful
            dattrs = dict(attrs)
            if 'class' in dattrs and dattrs['class'] == 'header':
                self.in_header = True
        if tag == 'a' and self.in_header:
            self.in_headline = True

    def handle_endtag(self, tag):
        if tag == 'div':
            self.in_header = False
        if tag == 'a':
            self.in_headline = False

    def handle_data(self, data):
        if self.in_headline:
            self.headlines.append(data)

br = Browser()
response = br.open('http://tylerlesmann.com/')
hlp = HeadLineParser()
hlp.feed(response.read())
for headline in hlp.headlines:
    print headline
hlp.close()

You use HTMLParser by extending it. The four functions you'll need are init, handle_starttag, handle_endtag, and handle_data. HTMLParser can be confusing at first because it works in a unique matter. Whenever a html tag is encountered, handle_starttag is called. Whenever a closing tag is found, handle_endtag is called. Whenever anything in between tags is encountered, handle_data is called.

The way to actually use HTMLParser is to use a system of flags, like in_header and in_headline from the example. We toggle them on in handle_starttag and off in handle_endtag. If you look at the html of this blog, you'll see that headlines are enclosed in classless <a> tags. There are alot of <a>s on this site. We need something unique to flag the headline <a>s. If you look carefully, you would see that all of the headlines are enclosed with <div>s with a header class. We can flag those and flag the <a>s only inside them, which is what the example does.

Now that the script has all the proper flags to detect headlines, we can simply have handle_data append any text to a list of headlines when our in in_headline flag is True.

To use our new parser, we simply make an instance of it and use the instance's feed method to run html through the parser. We can access the headlines attribute directly like we can in any object in python.

Posted by Tyler Lesmann on October 4, 2008 at 7:09
Tagged as: htmlparser mechanize python screen_scraping