Archive
Tags
android (3)
ant (2)
beautifulsoup (1)
debian (1)
decorators (1)
django (9)
dovecot (1)
encryption (1)
fix (4)
gotcha (2)
hobo (1)
htmlparser (1)
imaplib (2)
java (1)
json (2)
kerberos (2)
linux (7)
lxml (5)
markdown (4)
mechanize (6)
multiprocessing (1)
mysql (2)
nagios (2)
new_features (3)
open_source (5)
optparse (2)
parsing (1)
perl (2)
postgres (1)
preseed (1)
pxe (4)
pyqt4 (1)
python (41)
raid (1)
rails (1)
red_hat (1)
reportlab (4)
request_tracker (2)
rt (2)
ruby (1)
scala (1)
screen_scraping (7)
shell_scripting (8)
soap (1)
solaris (3)
sql (2)
sqlalchemy (2)
tips_and_tricks (1)
twitter (2)
ubuntu (1)
vmware (2)
windows (1)
zimbra (2)

We often want to dump a lot of data in programming. Tables are a fair way of displaying data and ReportLab supports them. Be warned! The syntax is ugly.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#!/usr/bin/env python

from reportlab.lib import colors
from reportlab.lib.units import inch
from reportlab.lib.pagesizes import letter
from reportlab.platypus import SimpleDocTemplate, Spacer, Table, TableStyle

doc = SimpleDocTemplate("table.pdf", pagesize=letter)

data = [
    ['Item', 'Cost', 'Quantity'],
    ['Widget', 3.99, 26],
    ['Whatsit', 2.25, 26],
    ['Hooplah', 10.00, 26],
]

parts = []
table = Table(data, [3 * inch, 1.5 * inch, inch])
table_with_style = Table(data, [3 * inch, 1.5 * inch, inch])

table_with_style.setStyle(TableStyle([
    ('FONT', (0, 0), (-1, -1), 'Helvetica'),
    ('FONT', (0, 0), (-1, 0), 'Helvetica-Bold'),
    ('FONTSIZE', (0, 0), (-1, -1), 8),
    ('INNERGRID', (0, 0), (-1, -1), 0.25, colors.black),
    ('BOX', (0, 0), (-1, 0), 0.25, colors.green),
    ('ALIGN', (0, 0), (-1, 0), 'CENTER'),
]))

parts.append(table)
parts.append(Spacer(1, 0.5 * inch))
parts.append(table_with_style)
doc.build(parts)

The first thing you need to make a Table is some data. Any matrix with a standard number of columns will do. The Table needs two arguments: The data and a sequence of column widths. With that, you can build a Table like any other Flowable. It will look bland, but it will be there.

Tables can be styled to your liking with setStyle and an instance of TableStyle. TableStyles are ugly, syntax-wise. They take one argument, which is a sequence of tailored sequences. For the most part, these tailored sequences are structured as (attribute, start_cell, end_cell, attribute_value). There are a few, like BOX and INNERGRID, that have five values instead of four. The last two are the line width and color. The are more TableStyle attributes than I have used in this example. I would point you to a reference, but I have yet to find one. I will probably make an entire post dedicated to this.

Posted by Tyler Lesmann on January 29, 2009 at 12:53
Tagged as: python reportlab

Adding images to PDFs is not much different than adding text. Here is how to add an image as a flowable:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#!/usr/bin/env python

import os
import urllib2
from reportlab.lib.pagesizes import letter
from reportlab.platypus import SimpleDocTemplate, Image

filename = './python-logo.png'

def get_python_image():
    """ Get a python logo image for this example """
    if not os.path.exists(filename):
        response = urllib2.urlopen(
            'http://www.python.org/community/logos/python-logo.png')
        f = open(filename, 'w')
        f.write(response.read())
        f.close()

get_python_image()

doc = SimpleDocTemplate("image.pdf", pagesize=letter)
parts = []
parts.append(Image(filename))
doc.build(parts)

Ignore the get_python_image function. It is in there only to make this example easily runnable. As you see, I import the Image Flowable from platypus. Image takes a minimum of one argument, which is the path to the image to use. You can build it into a SimpleDocTemplate just like a Paragraph or a Spacer.

Writing images to a canvas is easy, but has one gotcha.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#!/usr/bin/env python

import os
import urllib2
from reportlab.lib.pagesizes import letter
from reportlab.lib.units import inch
from reportlab.pdfgen import canvas

filename = './python-logo.png'

def get_python_image():
    """ Get a python logo image for this example """
    if not os.path.exists(filename):
        response = urllib2.urlopen(
            'http://www.python.org/community/logos/python-logo.png')
        f = open(filename, 'w')
        f.write(response.read())
        f.close()

get_python_image()

c = canvas.Canvas('imageabs.pdf', pagesize=letter)
width, height = letter
c.drawImage(filename, inch, height - 2 * inch) # Who needs consistency?
c.showPage()
c.save()

We use the drawImage method of our Canvas instance. Here is the gotcha. Unlike drawString and its siblings, drawImage takes its content, the path to the image, first and its coordinates second. This is poor design by the ReportLab developers. It is still easy to add the image though. Just remember that if you get an obtuse error like this:

AttributeError: 'float' object has no attribute 'jpeg_fh'

Or this:

AttributeError: 'int' object has no attribute 'jpeg_fh'

Then the arguments are out of order.

Posted by Tyler Lesmann on January 28, 2009 at 6:07 and commented on 3 times
Tagged as: python reportlab

In my last post, I presented how to place text in an absolute position on a page. You will not want to do this all the time. Imagine putting a book of text in PDF form that way. An exercise in masochism to be sure. ReportLab offers a spectacular framework, called platypus, for building real documents. There are tons of new terms to learn here, so I will try to be thorough. Here is some example code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#!/usr/bin/env python
from reportlab.lib.pagesizes import letter
from reportlab.lib.styles import ParagraphStyle
from reportlab.lib.units import inch
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer

doc = SimpleDocTemplate("paragraphs.pdf", pagesize=letter)
parts = []

style = ParagraphStyle(
    name='Normal',
    fontName='Helvetica-Bold',
    fontSize=9,
)

parts.append(Paragraph("Paragraphs are a kind of Flowable.  " * 20, style))
parts.append(Spacer(1, 0.2 * inch))
parts.append(Paragraph("Paragraphs are natural in their behavior.  " * 20,
    style))
parts.append(Spacer(1, 0.2 * inch))
parts.append(Paragraph(
    "Paragraphs make sense for flexible and dynamic documents.  " * 20, style))
doc.build(parts)

The first point of interest is line 7. SimpleDocTemplate is just what it sounds like. It provides information to control the behavior of the flowables. It is also used to write the PDF to disk. You can write your own Templates to offer features like multiple columns. I will cover that in another post.

On lines 10 to 14, we are defining a ParagraphStyle. This lets you define how your text will be displayed, like font and alignment. You might be wondering about the name. This is required. It is used when the ParagraphStyle is part of a StyleSheet, which I will cover in another post.

Now, we are ready to build a Paragraph object. This is the Flowable I have been talking about. In ReportLab, Flowables are objects that have wrapping, positioning, and splitting behavior defined. A Paragraph will stay within the margins and span across pages without trouble. They only require some text and a ParagraphStyle.

A Spacer is another Flowable. It is just whitespace. It takes two arguments, width and height.

When we are ready to write the PDF, we call our doc's, our SimpleDocTemplate instance's, build method with a collection of Flowables.

Posted by Tyler Lesmann on January 27, 2009 at 12:50
Tagged as: python reportlab

When making PDFs in python, we use reportlab for the most part. It is an extensive module that can do make about anything you would want in a PDF. Today, I will cover how to do the most basic function, putting some text somewhere on the page. Here is the simplest example:

1
2
3
4
5
6
7
#!/usr/bin/env python
from reportlab.pdfgen import canvas

c = canvas.Canvas('rldemo1.pdf')
c.drawString(100, 100, 'Hello, world!')
c.showPage()
c.save()

You will notice that reportlab takes care of opening the file when we create a new Canvas instance. The drawString method puts a piece of text in an interesting place. It starts 100 points from the left as expected, but also 100 points from the bottom. I expected the origin to be the top-left, but reportlab starts from the bottom-left. With showPage, we commit our changes and save will write the file to disc. If you run this, you will also notice that the page size is A4 by default, which is because ReportLab, the company, is based in the UK. So how do we change that and start from the top instead of the bottom in reportlab?

1
2
3
4
5
6
7
8
9
#!/usr/bin/env python
from reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas

c = canvas.Canvas('rldemo2.pdf', pagesize=letter)
width, height = letter
c.drawString(100, height - 100, 'From the top!')
c.showPage()
c.save()

So here we supply Canvas with the letter pagesize. Just that simple. Now that we have the pagesize, which is but a tuple, we can extract the width and height. With the page height, we can subtract the offset from the top to have the text placed relative to the top of the page. Can I use something other than points? Real paper layouts are done in inches and centimeters.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
#!/usr/bin/env python
from reportlab.lib.pagesizes import letter
from reportlab.lib.units import inch, cm
from reportlab.pdfgen import canvas

c = canvas.Canvas('rldemo3.pdf', pagesize=letter)
width, height = letter
c.drawString(inch, height - inch, '1 inch')
c.drawString(inch, height - 2 * inch, '2 inches')
c.drawString(cm, cm, '1 cm')
c.drawString(cm, 2 * cm, '2 cm')
c.showPage()
c.save()

Our reportlab module features a variety of constants for unit size, which are multiples of points. One gotcha with all of the text placement. The text is always drawn top and right of the coordinates given with drawString, which you can see if you run this example. You can write text from the left now, but what about centered text and right-aligned text?

1
2
3
4
5
6
7
8
9
#!/usr/bin/env python
from reportlab.lib.pagesizes import letter
from reportlab.lib.units import inch
from reportlab.pdfgen import canvas

c = canvas.Canvas('rldemo4.pdf', pagesize=letter)
width, height = letter
c.drawString(inch, height - inch, 'Left')
c.drawCentredString(width / 2.0, height - inch, 'Center') # Notice the UK

spelling :) c.drawRightString(width - inch, height - inch, 'Right') c.showPage() c.save()

There are two more methods for printing text, drawCentredString and drawRightString. The only difference from drawString is that these start drawing from the center and right. All three still go from the bottom up.

Posted by Tyler Lesmann on January 23, 2009 at 6:32 and commented on 1 time
Tagged as: python reportlab

Inspired by Amy Iris, I have made a little bit of automation for twitter. On twitter, it is not easy to find others by interest. This little piece of code runs a search on the terms you specify and then checks the bios of each poster for your search terms. With each user that is a match, it will add them as a follower for you.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
#!/usr/bin/env python

import getopt
import re
import simplejson
import sys
import time
import twitter
import urllib2
from getpass import getpass
from urllib import urlencode

def compile_filter(query):
    good = []
    bad = []
    words = query.split()
    for word in words:
        if word[0] == '-':
            bad.append(re.compile(word, re.IGNORECASE))
        else:
            good.append(re.compile(word, re.IGNORECASE))
    return (good, bad)

def filter_user_by_bio(user, filter, api=None):
    if api is None:
        api = twitter.Api()
    bio = api.GetUser(user).GetDescription()
    if bio is None:
        return False # We only follow those with bios
    good, bad = filter
    goodmatches = []
    for word in bad:
        if not word.search(bio) is None:
            return False
    for word in good:
        if not word.search(bio) is None:
            goodmatches.append(word)
    if good == goodmatches:
        return True
    return False

def follow_by_query(username, password, q, rpp=None, lang=None):
    filter = compile_filter(q)
    api = twitter.Api(username=username, password=password)
    friends = []
    for user in api.GetFriends():
        friends.append(user.GetScreenName())
    goodusers = []
    for user in get_users_from_search(q, rpp, lang):
        if filter_user_by_bio(user, filter, api):
            goodusers.append(user)
    newusers = []
    for user in goodusers:
        if not user in friends:
            api.CreateFriendship(user)
            friends.append(user)
            newusers.append(user)
    return newusers

def get_users_from_search(query, resultnum=None, lang=None):
    q = []
    rpp = 10
    q.append(urlencode({'q': query}))
    if not lang is None:
        q.append(urlencode({'lang': lang}))
    if not resultnum is None:
        rpp = resultnum
    q.append(urlencode({'rpp': rpp}))
    response = urllib2.urlopen(
        'http://search.twitter.com/search.json?',
        '&'.join(q)
    )
    data = simplejson.load(response)
    for result in data['results']:
        yield result['from_user'] 

def print_usage():
    sys.stderr.write("""
    Usage: %s -u username [-p password] [-r search_result_number] [-l language]
        terms ...

    -l language = Filter search by language.
    -p password = Optional.  If not supplied, you will be asked for it.
    -r search_result_number = Number of results to pull from twitter searches.
    -u username = twitter username.
""" % sys.argv[0])

if __name__ == '__main__':
    optlist, args = getopt.getopt(sys.argv[1:], 'l:p:r:u:')
    if not args:
        sys.stderr.write("You must specify search terms\n")
        print_usage()
        raise SystemExit, 1
    optd = dict(optlist)
    if not '-u' in optd:
        sys.stderr.write("You must specify a user\n")
        print_usage()
        raise SystemExit, 1
    username = optd['-u']
    query = " ".join(args)

    if not '-p' in optd:
        sys.stderr.write("Password:")
        password = getpass("")
    else:
        password = optd['-p']

    rpp = None
    if '-r' in optd:
        rpp = optd['-r']

    lang = None
    if '-l' in optd:
        lang = optd['-l']


    try:
        newusers = follow_by_query(username, password, query, rpp, lang)
    except urllib2.HTTPError, e:
        sys.stderr.write("Cannot connect to Twitter\n")
        sys.stderr.write(str(e))
        sys.stderr.write("\n")
    else:
        if newusers:
            print ", ".join(newusers), 'Added!'

The usage is as such, assuming the script is named twitsheep.py:

Usage: ./twitsheep.py -u username [-p password] [-r search_result_number] [-l language]
    terms ...

-l language = Filter search by language.
-p password = Optional.  If not supplied, you will be asked for it.
-r search_result_number = Number of results to pull from twitter searches.
-u username = twitter username.

Running the program without arguments produces the usage as well. It is best to run this with cron or Scheduled Tasks every thirty minutes at most. The default search results to check are ten, but you can turn it up to about 30. If you start getting 400 Errors, a.k.a Bad Request, you are being throttled by twitter's DoS protection. You should consider a lower amount of search results or a longer duration between searches.

You can see an active test of this script here. It is running with this command line:

./twitsheep.py -u twitsheep -r 20 -l en "python -monty -ball"

If you have any features you would like integrated into this, please leave a comment.

Posted by Tyler Lesmann on January 22, 2009 at 6:09 and commented on 10 times
Tagged as: json python twitter

I have read a lot of bad python code as of recent. The code gets the work done, but it fails in what I consider Python's killer feature, readability. The programmers are not to blame. I have read bad python in books about python programming! I do not want you, as a reader of this blog, to write python poorly. Finding the proper path to pythonic code is easy.

The first piece for study is the offical style guide for python. Written by the father of python, Guido van Rossum, and Barry Warsaw, this document will tell you the pythonic way of placing whitespace, importing modules, documenting code, and more. Following the suggestions here will make your code more readable to others and easier to write for you, but your programming will not be pythonic just yet.

The Zen of Python, by Tim Peters, is the next read and it is a short one. The idea is to give you something easy to remember that will make you think in a pythonic fashion. This document is available inside of python if you import this.

The final piece is What is Pythonic?, by Martijn Faassen. Here you will find real examples of what is pythonic and what is not.

After reading these, you have no excuse to write ugly python code. I do not want to see this:

from somemodule import * # Wrong, what exactly are we importing?
import somemodule # Good
from somemodule import something # Fine too

or this:

h='somevalue' # Bad, what is 'h'?  Needs spaces around the '='.
header = 'somevalue' # Good

or any of this anymore:

# Bad
for i in range(len(somelist)):
    do_something(somelist[i])
# Good
for element in somelist:
    do_something(element)

Pythonic is your new mantra, if you intend to write python. If you want more to read, take a look at the rest of the Python Enhancement Proposals, specifically the Finished PEPs. For examples of pythonic code, check out the standard python modules or any of the python code featured on this blog.

Posted by Tyler Lesmann on January 18, 2009 at 13:28
Tagged as: python tips_and_tricks

The most fundamental of abilities in shell scripts is moving about the file system.

import os
os.chdir('/')
contents = os.listdir(os.getcwd())

To change directories, we use os.chdir with an argument of our destination. To get a listing of the contents of a directory, os.listdir is the command to use. It takes one argument, which is the directory that holds the contents. If we want to know our current location on the filesystem, then os.getcwd will tell us the path.

Instead of moving around, you may want to do the same thing on the files and folders of a tree of directories, a.k.a. a recursive operation. In this next scenario, I have a hypothetical module called mp32vorbis that has a convert function, which converts a specified mp3 file into an ogg vorbis file.

import os
import mp32vorbis

for root, dirs, files in os.walk(os.getcwd()):
    for dir in dirs:
        print "Processing", os.path.join(root, dir)
    for file in files:
        if file.endswith('.mp3'):
            print "Converting", file
            mp32vorbis.convert(file)

As you can see, recursive operations are quick and easy with os.walk. It is a generator that yields each directory, root, plus a collection of directories and files within it. You can process each separate directory and file by looping through them. You may have noticed os.path.join. This is a simple function that connects its arguments with the directory separator of the host OS, which is backslashes for Windows and forward slashes for everything else.

Posted by Tyler Lesmann on January 15, 2009 at 5:41 and commented on 2 times
Tagged as: python shell_scripting

I found a blog post today that gleans the names and messages from the Twitter search. As an exercise, I decided to rewrite this using mechanize and lxml. My code writes to the standard out instead of a file. The user could redirect the output for the same effect. Note: I am aware that Twitter has JSON, plus several apis, and using that would be easier than this. This is an exercise.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
#!/usr/bin/env python
import getopt                     
import sys                        
from mechanize import Browser, _mechanize
from lxml.html import parse              

baseurl = "http://search.twitter.com/search?lang=en&q="

def search_twitter(terms, pages=1):                    
    """                                                
    terms = a list of search terms                     
    pages(optional) = number of pages to retrive       

    returns a list of dictionaries
    """
    br = Browser()
    br.set_handle_robots(False)
    results = []
    response = br.open("".join([baseurl, "+".join(terms)]))
    while(pages > 0):
        doc = parse(response).getroot()
        for msg in doc.cssselect('div.msg'):
            name = msg.cssselect('a')[0].text_content()
            text = msg.cssselect('span')[0].text_content()
            text = text.replace(' (expand)', '')
            results.append({
                'name': name,
                'text': text,
            })
        try:
            response = br.follow_link(text='Older')
        except _mechanize.LinkNotFoundError:
            break # No more pages :(
        pages -= 1
    return results

if __name__ == '__main__':
    optlist, args = getopt.getopt(sys.argv[1:], 'p:', ['pages='])
    optd = dict(optlist)
    pages = 1
    if '-p' in optd:
        pages = int(optd['-p'])
    if '--pages' in optd:
        pages = int(optd['--pages'])
    if len(args) < 1:
        print """
        Usage: %s [-p] [--pages] search terms
            -p, --pages = number of pages to retrieve
        """ % sys.argv[0]
        raise SystemExit, 1
    results = search_twitter(args, pages)
    for result in results:
        print "%(name)-20s%(text)s" % result
Posted by Tyler Lesmann on January 14, 2009 at 15:16
Tagged as: lxml mechanize python screen_scraping

In my post from a while back, I gave an example of the standard HTMLParser's use. HTMLParser is not the easiest way to glean information from HTML. There are two modules that are not part of the standard python distribution that can shorten the development time. The first is BeautifulSoup. Here is the code from the previous episode using BeautifulSoup instead of HTMLParser.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
#!/usr/bin/env python

from BeautifulSoup import BeautifulSoup
from mechanize import Browser

br = Browser()
response = br.open('http://tylerlesmann.com/')
soup = BeautifulSoup(response.read())
headers = soup.findAll('div', attrs={'class': 'header'})
headlines = []
for header in headers:
    links = header.findAll('a')
    for link in links:
        headlines.append(link.string)
for headline in headlines:
    print headline

This is a lot shorter, 16 instead of 38 lines. It also took about 20 seconds to write. There is one gotcha here. Both scripts do the same task. This one using BeautifulSoup takes over twice as long to run. CPU time is much cheaper than development time though.

The next module is lxml. Here's the lxml version of the code.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
#!/usr/bin/env python

from lxml.html import parse
from mechanize import Browser

br = Browser()
response = br.open('http://tylerlesmann.com/')
doc = parse(response).getroot()
for link in doc.cssselect('div.header a'):
    print link.text_content()

As you can see, it is even shorter than BeautifulSoup at 10 lines. On top of that, lxml is faster than HTMLParser. So what is the catch? The lxml module uses C code, so you will not be able to use it on Google's AppEngine or on Jython.

Posted by Tyler Lesmann on January 14, 2009 at 6:13
Tagged as: beautifulsoup lxml mechanize python screen_scraping

At times, you will have to get and set environment variables to get the desired effect in your scripts. The os module offers this function through a dictionary like object called environ.

import os
homedir = os.environ['HOME']
if not '/opt/mysuite/bin' in os.environ['PATH']:
    os.environ['PATH'] += ':/opt/mysuite/bin'
os.system('mysuiteapp') # Executable in /opt/mysuite/bin
Posted by Tyler Lesmann on January 14, 2009 at 5:41
Tagged as: python shell_scripting

So you are running programs from python now. Perhaps you want to check that a file is readable and executable before trusting it to run? The os module comes to the rescue once again!

import os
filename = 'bin/someprogram'
if not os.path.exists(filename):
    print filename, 'does not exist!'
    raise SystemExit, 2
if not os.access(filename, os.R_OK):
    print filename, 'is not readable!'
    raise SystemExit, 3
if not os.access(filename, os.X_OK):
    print filename, 'is not executable!'
    raise SystemExit, 4

We would like to know that the file exists. This is determined with os.path.exists. Is the file readable and executable? We can find out with os.access and a couple of constants from the os module, os.R_OK and os.X_OK. We could also test if a file is writeable with os.access. We just use the os.W_OK constant. Simple, yes?

Would you like to know the times of creation, modification, and access? What about file size?

import datetime
import os
filename = 'somedir/somefile'
ctime = datetime.datetime.fromtimestamp(os.path.getctime(filename))
mtime = datetime.datetime.fromtimestamp(os.path.getmtime(filename))
atime = datetime.datetime.fromtimestamp(os.path.getatime(filename))
sizeinbytes = os.path.getsize(filename)

It is pretty straight forward. You might be wondering about why I am using datetime.datetime.fromtimestamp here. The os.path.getXtime functions all spit out POSIX timestamps. In python, it is more useful to have these be python datetimes. The datetime.datetime.fromtimestamp function produces a datetime object from a given timestamp.

Posted by Tyler Lesmann on January 13, 2009 at 5:47
Tagged as: python shell_scripting

Like shell scripts, python may not be able to do everything you want by itself. You may need to execute an external program to get the job done.

NOTE: This is the old way to call external programs and still valid for python versions 2.3 and earlier. However, if you are using 2.4 or later, then use subprocess. The documentation for subprocess is very flushed out so I will not be expanding on it here.

Python's os module offers loads of ways to accomplish this. I will cover only the ones you are to use when replacing shell scripts.

import os
return_status = os.system('ping -c 5 localhost')

The os.system function is the least useful. I will execute a command as specified and return the return status of the program. This is especially useless on operating systems with inconsistent return statuses, i.g. Windows. Any output from the program will be sent to the standard locations, stdout and stderr.

import os
output = os.popen('ping -c 5 localhost')
print output.read() # Streams are file-like

Perhaps you might want to capture the program's output for later. The os.popen function returns a stream of the output from the program. These streams are handled just like files in python. Use the read and readline methods to access the output. With popen, the function returns immediately, so you can run multiple external programs at once.

import os
stdin, stdout = os.popen2('sed s/p/q/g')
stdin.writelines('Replace the ps with qs please')
stdin.close()
print stdout.readline() # prints Reqlace qs with qs qlease

What if you want to give the program a little input? The os.popen2 function lets you do just that. It returns a file stream to both the stdin and stdout of the program. Just write your input, close the stdin stream, and read the output. There are more popen functions. The os.popen3 returns file streams to stdin, stdout, and stderr. os.popen4 returns two file streams, the stdin and the merged stdout/stderr.

Posted by Tyler Lesmann on January 12, 2009 at 5:51
Tagged as: python shell_scripting

You might need to make folders and link files from time to time. The os module is used for both of these tasks.

import os
os.mkdir('newdir') # like mkdir
os.makedirs('notexistent/bogus/newdir') # like mkdir -p

os.mkdir and os.makedirs offer little in the way of surprises. os.mkdir works like mkdir and os.makedirs works like mkdir -p, creating any intermediate directories it needs to create the specified directory.

import os
os.link('file', 'newhardlink') # think ln
os.symlink('fileordir', 'newsymlink') # think ln -s

os.link creates hard links to files just like ln. os.symlink creates soft links in the way that ln -s does. I love how python works in such a predictable fashion.

Posted by Tyler Lesmann on January 11, 2009 at 5:48
Tagged as: python shell_scripting

Removing files is not any harder in python than it is in bash.

import os
os.remove('unneeded/file')

os.remove will delete one file you specify. os.unlink is the exact same function.

import os
os.rmdir('some/empty/dir')
os.removedirs('other/empty/dir')

os.rmdir will remove a directory, assuming it is empty. os.removedirs works in a similar fashion.

import shutil
shutil.rmtree('any/dir/empty/or/not')

shutil.rmtree works like rm -r in that it removes a tree of directories contents and all. Argument must be a directory.

Posted by Tyler Lesmann on January 10, 2009 at 6:20
Tagged as: python shell_scripting

There are a few different methods for moving and renaming files in python.

import os
os.rename('somefile', 'somewhere/else')

os.rename will move and rename a file just like mv.

import os
os.renames('somefile', 'even/some/dirs/that/dont/exist')

os.renames is really cool. It not only does the same thing as os.rename, but it also can move a file to a directory, or a whole tree of directories, that do not yet exist. It will create any directories it needs to get the job done.

import shutil
shutil.move('somefile', 'somewhere/else')

shutil.move is different than os.rename in that it will use os.rename if the source file and destination file are located on the same filesystem. Otherwise, it will copy the files and directories and remove the originals afterwards.

Posted by Tyler Lesmann on January 9, 2009 at 5:59
Tagged as: python shell_scripting

I was looking at my top search queries on google the other day and I noticed this blog was referenced at number 4 for copying folders in python. I do not have a post that covers this though. I have decided to do a daily series on how to replace shell scripts with Python starting with this post about copying files and folders.

Copying files is simple in Python. Actually, all shell script-like operations are.

import shutil # This is the module that will replace most shell operations
shutil.copy('/path/to/somefile', 'path/to/the/copy') # works just like cp
shutil.copy2('somefile', '/path/to/copy') # works just like cp -p

Not too hard. You say you want to copy a directory recursively? It could not be easier.

import shutil
shutil.copytree('src/directory', 'dst/directory') # This copies the tree and the targets of symlinks
shutil.copytree('src/directory', 'dst/directory', symlinks=True) # This copies the tree and the symlinks themselves

If you want to do something more complex than shutil.copytree, then you will want to use os.walk, which I will cover in the next few days.

Posted by Tyler Lesmann on January 8, 2009 at 7:58
Tagged as: python shell_scripting