Python Rocks! and other rants
Weblog of Kent S Johnson


Why I love Python 5

Easy introspection and dynamic loading

This example shows off several useful features of Python including introspection, dynamic loading, first-class functions and flexible except clauses.

At work I have some Java code that uses XPath support from the Xalan class org.apache.xpath.XPathAPI. In Java 1.4 this class is provided with the JRE. In Java 1.5 they moved the class to I need to run with either version of Java. I prefer not to bundle Xalan with my program, so I wrote a wrapper that dynamically locates the correct version and dispatches to it:

// The XPathAPI is in different packages in Java 1.4 and 1.5.
// Use introspection to find the right one
private static Method __selectSingleNode;

static {
    // Look for the XPathAPI class in two places
    Class XPathAPI = null;
    try {
        XPathAPI = Class.forName("org.apache.xpath.XPathAPI");
    } catch (ClassNotFoundException e) {
        try {
            XPathAPI = Class.forName("");
        } catch (ClassNotFoundException e1) {
    // Get the methods we support
    try {
        __selectSingleNode = 
                             new Class[] { Node.class, String.class} );
    } catch (SecurityException e) {
    } catch (NoSuchMethodException e) {

/** XPathAPI.selectSingleNode */
public static Node selectSingleNode(Node node, String xpath) {
    try {
        return (Node)__selectSingleNode.invoke(null, new Object[] { node, xpath });
    } catch (IllegalArgumentException e) {
    } catch (IllegalAccessException e) {
    } catch (InvocationTargetException e) {
    return null;

Wow, what an ugly mess! What would it look like in Python?

The initial static block would become a conditional import:

  import org.apache.xpath.XPathAPI as XPathAPI
except ImportError:
  import as XPathAPI

That was easy - and wait - we're done now! The client code can call XPathAPI.selectSingleNode() and it will work!

But suppose for the sake of example we want to get a reference to selectSingleNode using introspection. That is as simple as

__selectSingleNode = getattr(XPathAPI, 'selectSingleNode')

This __selectSingleNode is itself a callable function (not a wrapper around a function) so clients can call it directly; the selectSingleNode() wrapper is not needed at all.

I have omitted the exception handling in the Python code because these exceptions are fatal and might as well terminate the program. If I wanted to catch them I could use an except clause with multiple exception types, instead of multiple except clauses, something like this:

  __selectSingleNode = ...
except (SecurityException, NoSuchMethodException), e:
posted at 10:49:20    #    comment []    trackback []

Simple itertools.groupby() example

Suppose you have a (sorted) list of dicts containing the names of cities and states, and you want to print them out with headings by state:

>>> cities = [
...     { 'city' : 'Harford', 'state' : 'Connecticut' },
...     { 'city' : 'Boston', 'state' : 'Massachusetts' },
...     { 'city' : 'Worcester', 'state' : 'Massachusetts' },
...     { 'city' : 'Albany', 'state' : 'New York' },
...     { 'city' : 'New York City', 'state' : 'New York' },
...     { 'city' : 'Yonkers', 'state' : 'New York' },
... ]

First let me explain operator.itemgetter(). This function is a factory for new functions. It creates functions that access items using a key. In this case I will use it to create a function to access the 'state' item of each record:

>>> from operator import itemgetter
>>> getState = itemgetter('state')
>>> getState
<operator.itemgetter object at 0x00A31D90>
>>> getState(cities[0])
>>> [ getState(record) for record in cities ]
['Connecticut', 'Massachusetts', 'Massachusetts', 'New York', 'New York', 'New York']

So the value returned by itemgetter('state') is a function that accepts a dict as an argument and returns the 'state' item of the dict. Calling getState(d) is the same as writing d['state'].

What does this have to do with itertool.groupby()?

>>> from itertools import groupby
>>> help(groupby)
Help on class groupby in module itertools:

class groupby(__builtin__.object)
|  groupby(iterable[, keyfunc]) -> create an iterator which returns
|  (key, sub-iterator) grouped by each value of key(value).

groupby() takes an optional second argument which is a function to extract keys from the data. getState() is just the function we need.

>>> groups = groupby(cities, getState)
>>> groups
<itertools.groupby object at 0x00A88300>

Hmm. That's a bit opaque. groupby() returns an iterator. Each item in the iterator is a pair of (key, group). Let's take a look:

>>> for key, group in groups:
...   print key, group
Connecticut <itertools._grouper object at 0x0089D0F0>
Massachusetts <itertools._grouper object at 0x0089D0C0>
New York <itertools._grouper object at 0x0089D0F0>

Hmm. Still a bit opaque :-) The key part is clear - that's the state, extracted with getState - but group is another iterator. One way to look at it's contents is to use a nested loop. Note that I have to call groupby() again, the old iterator was consumed by the last loop:

>>> for key, group in groupby(cities, getState):
...   print key
...   for record in group:
...     print record
{'city': 'Harford', 'state': 'Connecticut'}
{'city': 'Boston', 'state': 'Massachusetts'}
{'city': 'Worcester', 'state': 'Massachusetts'}
New York
{'city': 'Albany', 'state': 'New York'}
{'city': 'New York City', 'state': 'New York'}
{'city': 'Yonkers', 'state': 'New York'}

Well, that makes more sense! And it's not too far from the original requirement, we just need to pretty up the output a bit. How about this:

>>> for key, group in groupby(cities, getState):
...   print 'State:', key
...   for record in group:
...     print '   ', record['city']
State: Connecticut
State: Massachusetts
State: New York
     New York City

Other than misspelling Hartford (sheesh, and I grew up in Connecticut!) that's not too bad!

posted at 22:37:36    #    comment []    trackback []

How I write code

I tend to design from the bottom up - not exclusively, but in general I make small parts and combine them to make larger parts until I have something that does what I want. I refactor constantly as my understanding of a problem and the solution increase. This way I always have complete working code for some section of the problem. I rarely use stubs of any kind.

To start I will take some small section of the problem and think about what kind of data and operations on the data I need to solve it. For a very simple problem I might just write some functions to operate on the data. As I expand into larger parts of the problem I might find that several functions are operating on the same data and decide that they belong in a class. Or it might be clear from the start that I want to create a class around the data.

When one chunk is done to my satisfaction, I take on another, and another. I am creating building blocks, then using the building blocks to create larger blocks. Some of the blocks are classes, others are functions.

I write unit tests as I go, sometimes test-first, sometimes test-after, but always alternating writing code with writing tests so I know the code works and I have a safety net when I need to refactor or make other major changes.

At any time I may discover that I made a bad decision earlier, or realize that there is a better way to structure the code or data. Then I stop and rework until I am happy with what I have. The unit tests give me confidence that I haven's broken anything in the process. It's a very organic process, I sometimes think of it as growing a program.

(from a post to the Python-tutor list)

posted at 07:01:04    #    comment []    trackback []

Recommended Reading

I have just added a page of recommended books to my main web site. It is very much a work in progress but there is enough there for an initial post.

posted at 20:12:32    #    comment []    trackback []

What's so great about Ruby?

I'm reading Bruce Tate's latest book, Beyond Java. In it, he argues that Java has become overgrown, unwieldy and vulnerable for replacement in many applications. Prime candidates to replace it are the dynamic languages, particularly Ruby.

As a staunch Python advocate I read his description of Ruby with interest. Most Ruby features that he thinks are cool are available in Python in some form. Some are considered wizard-level tricks in Python instead of the mainstream practices they seem to be in Ruby.

For example, in Ruby you can easily add to a class definition. You just declare the class again and extend the definition. This works even for built-in classes. The Ruby approach is conceptually very simple--it reuses the class definition syntax. In Python you can add methods to a class after it is defined by adding attributes to the class. Python's approach is fairly obscure - getting it right can take a few tries.

Ruby allows mixins--class fragments that can be added to a class definition to extend it. Python can do the same with multiple inheritance, at the time a class is defined, or by appending to __bases__, which might be considered a hack.

Ruby has support for creating aliases of methods and replacing them, and this is considered a good thing. In Python this is called monkeypatching and is generally frowned on.

So there is not that much difference in capability. In Ruby some of these things are easier, and I don't discount that. But the main difference seems to be philosophical or cultural. In Python classes are thought of as fairly static--once you create it, it's done. Metaprogramming tricks are used during class creation to get some special effect, or to meet some unusual need.

In Ruby, though, classes are thought of as malleable. A class definition is just a starting point for the full functionality of the class. It's an interesting way of looking at it.

posted at 21:14:40    #    comment []    trackback []

Unit testing is an enabling technology

I'm reading Pragmatic Unit Testing by Andrew Hunt and David Thomas. I'm a big fan of unit testing and I think one of the biggest benefits is often overlooked.

The direct benefits of unit testing are clear and substantial: better tested, more reliable and often better designed code. The indirect benefit that is often neglected is unit tests as an enabling technology.

Have you ever looked at a pile of code and thought,

  • This is a mess, I should rewrite it
  • There's a lot of dead code here I should rip out
  • This would be easier to understand if I broke it up into smaller pieces
  • There's a lot of duplication here I should factor out
  • If I refactored this a little it would be a lot easier to add this new feature

How often is that thought followed by action, and how often by the thought, "I can't do that, there's too much risk of breaking something"?

This is where unit tests shine long after they are written. If the code in question has extensive tests, you can change it with confidence instead of fear, knowing that any breakage will be quickly discovered. Unit tests provide a safety net. There is a qualitative shift from coding in fear to coding with confidence.

Think of it - no longer do you have to live with messy code, dead code, monster methods, duplicated code, designs that don't reflect your needs! How sweet!

I recently finished making some modifications to some file-generation code. When I started there were three large modules that created three large files. The modules and the files were similar but they shared little code; the first module had been copied and modified to make the second two. There were no tests for any of it. My job was to add some options to all three generated files.

The first thing I did was set up a test framework. The generated files are XML so I created some reference files and used XmlUnit to compare the generated files with the references.

Next I started factoring out duplicated code into a common base class. I focused on the parts I had to change, with a goal of only having to make a change in one place, in the base class, rather than duplicating the change three times.

The result? The three modules are dramatically smaller. Most of the option testing is centralized in the base class. The subclasses contain a top-level driver function and many small callbacks that customize the output to their particular requirements. There is probably less code in the four modules I ended up with than there was in the original three because of all the duplication I removed.

Now that's a reason to write unit tests!

posted at 11:28:00    #    comment []    trackback []

Why I love Python 4

Sometimes Java almost seems to go out of its way to make coding difficult, while Python goes out of its way to make it easy. Here is a case in point.

I needed a Java method that accepts a duration expressed as a number of minutes in a string, and returns the same duration formatted as HH:MM:SS. In Python, this is trivial:

def formatDuration(durStr):
  hours, minutes = divmod(int(durStr), 60)
  return '%02d:%02d:00' % (hours, minutes)

Java, on the other hand, makes me jump through hoops to do the same thing:

static String formatDuration(String spcsfDur) {
    int duration = Integer.parseInt(spcsfDur);
    int minutes = duration % 60;
    int hours = (duration - minutes) / 60;
    Object[] args = new Object[] {
            new Integer(hours),
            new Integer(minutes)

    String durStr = MessageFormat.format("{0,number,00}:{1,number,00}:00", args);
    return durStr;


posted at 09:24:16    #    comment []    trackback []

Will unit testing slow you down?

I am trying to encourage unit testing and test-driven development at work. As far as I know, only a few developers here are really test-infected.

I have asked several developers, "Do you write unit tests? If not, why not?". The universal response to the second question is, "I don't have time." This strikes me as strange because in my experience writing unit tests helps me to work faster, not slower. Why is that?

The immediate, short-term benefit of unit testing is that I can quickly and easily run the code I am working on. I can generally run a unit test for a single module in one or two mouse clicks. Most of my tests run in a few seconds. So when I make a change to a piece of code, I can find out almost instantly whether it works or not. As an extra benefit, work is more fun because I am writing code instead of running manual tests, and I stay in the flow because I am thinking about code instead of switching gears to run tests.

An intangible benefit is the confidence I have that the code is working because the tests pass. It's a great feeling to write a module with a test suite and know without a doubt that the module is doing what I want it to.

A long-term benefit that affects development speed is the impact on code quality. This has several facets. First, the code is likely to have few defects because it is thoroughly tested. This cuts down on the time I must spend later in debugging and rework. Second, because unit tests impose some constraints on modularity and coupling, the code tends to be well-structured. Finally, with the safety net of the unit tests I have freedom to refactor as needed, so the structure remains appropriate to the job at hand. Each of these facets improves the readability, maintainability and reusability of the code, and that directly impacts productivity.

I don't want to gloss over the down side of unit testing. There are occasional speed bumps. Typically they come when I have to figure out how to test something new and I have to take the time to figure out how to write the test and integrate it into my build. This doesn't happen too often; usually I can reuse a similar setup from another part of the project, or a framework I already have.

The initial hump that keeps people from unit testing at all is one of these speed bumps. To get started, you do have to figure out how to use a test framework such as JUnit. I recommend asking for help - JUnit is really not that hard to use, and a simple example can go a long way.

Then there are the times when I make a change that breaks a test and I have to go fix it. For example I might change a data format, a test data set, or the signature of a function. This is annoying but easy to deal with.

Even with these drawbacks, I see unit testing as a huge win for productivity.

posted at 09:22:08    #    comment []    trackback []

Never answer the same question twice

When a user comes to me with a question about a program I have written, I like to do two things. First, answer the question. Second, change the program so the question won't come up again.

This is particularly appropriate with error messages. If a user has to ask me what an error message means, the message isn't doing it's job. I rewrite it so that next time someone sees it, they won't have to ask me about it. It helps to have just explained it to a live user.

posted at 07:50:08    #    comment []    trackback []

When should I use classes?

A question beginners sometimes ask is, "Should I use classes in my program?" This article gives some ideas of when it is appropriate to introduce classes into a design.

posted at 20:40:32    #    comment []    trackback []


It's such a relief to be coding in Jython again after working with Java. In Java I feel like I'm fighting the language and the APIs all the time. It's way too hard to get anything done. Python just does what I want with much less straining and fuss.


posted at 15:18:24    #    comment []    trackback []

Sticky Widgets

A technique I am using quite a bit is to make sticky widgets - GUI widgets that autonomously remember some part of their state. For example, a text field that remembers its last value; a window that remembers its last size and shape; a file dialog that remembers the directory it last showed.

These widgets make it very easy to create a user interface with some persistent state. This saved state makes the GUI much easier on the users.

In my case the widgets are written in Java using Swing. Each widget takes a preferences key as a constructor argument. The given key and the class name make the full key. This allows the widget to save and restore its state without reliance on any application globals, and without writing any application code specifically to handle the stickyness. The widget registers itself as a listener to itself so it is notified when its state changes and it persists its new state.

This is work code so I won't post it but it is really pretty easy to do and very handy.

posted at 14:57:04    #    comment []    trackback []

Spring Free

For my current project, a distributed application with a database back-end, I tried Hibernate and Spring. I really wanted to like them! They have lots of cool features and make some things incredibly easy. But in the end I have gone back to tried-and-true simple tools - Jython, Velocity, Jetty and not much else.

Hibernate felt like it got in the way as much as it helped. I was constantly having to figure out the Hibernate way of things. My data is largely a tree structure flattened into a database table. Hibernate's lazy loading was sometimes great and sometimes exactly wrong.

Spring just felt too big. Every time I needed a new piece I would have to add another Spring jar to my lib directory and usually a few more jars that it depended on. It felt like using a sledgehammer to squash an ant.

Java also feels quite cumbersome now. I have been working primarily in Jython for about a year now and I hate the hoops that Java makes me jump though to get anything done.

In the end it all felt too confining. I was Mech Warrior, high in a robot programming vehicle, directing awesome power from from my command post. But I longed to put my feet on the ground, pick up a light pack and run.

So I have chucked it all, salvaging what I can, starting over for the rest. What a relief it is!

posted at 19:53:36    #    comment []    trackback []

Python and Unicode

Python has extensive support for Unicode data. Two issues that are not well documented elsewhere are the handling of non-Ascii characters in the Python interpreter, and use of the default system encoding. I cover those here.
posted at 21:29:36    #    comment []    trackback []

concat vs join - followup

A couple of people have made good points about my last post comparing string concatenation and join.

Marilyn Davis pointed out that in my data, the crossover point where join beats concatenation is always around 500 total characters in the final string. Hans Nowak pointed out that for much longer strings such the lines of a file or parts of a web page, the crossover point comes very quickly.

So here is ConcatTimer version 2 :-) This version dispenses with the fancy graphics and just looks for the crossover point. (It's not too smart about it, either.) It also looks at much larger text chunks - up to 80 characters. Here is the program:

import timeit

reps = 100 # How many reps to try?
unit = '    ' # Concat this string

# Naive concatenation using string +
def concatPlus(count):
for i in range(count):
s += unit
return s

# Concatention with string.join
def concatJoin(count):
for i in range(count):
return ''.join(s)

# Time one test case
def timeOne(fn, count):
setup = "from __main__ import " + fn.__name__
stmt = '%s(%d)' % (fn.__name__, count)
t = timeit.Timer(stmt, setup)
secs = min(t.repeat(3, reps))
return secs

# For strings of length unitLen, find the crossover point where appending
# takes the same amount of time as joining
def findOne(unitLen):
global unit
unit = ' ' * unitLen
t = 2
while 1:
tPlus = timeOne(concatPlus, t)
tJoin = timeOne(concatJoin, t)
if tPlus > tJoin:
t += 1
return t, tPlus, tJoin

for unitLen in range(1,80):
t, tPlus, tJoin = findOne(unitLen)
print '%2d %3d %3d %1.5f %1.5f' % (unitLen, t, t*unitLen, tPlus, tJoin)

And here is an elided list of results. The columns are the length of the pieces, the number of pieces where concat becomes more expensive than join, the total number of characters in the string at the crossover point, and the actual times. (I cut the number of reps down to keep this from taking too long to run.)

 1 475 475 0.02733 0.02732
 2 263 526 0.01581 0.01581
 3 169 507 0.01024 0.01022
 4 129 516 0.00782 0.00778
 5 100 500 0.00622 0.00604
 6  85 510 0.00517 0.00515
 7  73 511 0.00447 0.00446
 8  63 504 0.00386 0.00385
 9  57 513 0.00354 0.00353
10  53 530 0.00333 0.00333
11  47 517 0.00294 0.00292
12  45 540 0.00287 0.00285
13  41 533 0.00262 0.00260
14  38 532 0.00246 0.00244
15  36 540 0.00232 0.00230
16  34 544 0.00222 0.00222
17  31 527 0.00200 0.00199
18  29 522 0.00189 0.00188
19  30 570 0.00199 0.00194
20  28 560 0.00188 0.00186
21  28 588 0.00190 0.00185
22  26 572 0.00177 0.00174
23  25 575 0.00170 0.00168
24  24 576 0.00165 0.00163
25  23 575 0.00158 0.00156
26  22 572 0.00153 0.00151
27  21 567 0.00146 0.00144
28  21 588 0.00146 0.00146
29  21 609 0.00147 0.00144
30  20 600 0.00142 0.00139
31  19 589 0.00134 0.00134
32  20 640 0.00143 0.00139
33  19 627 0.00137 0.00136
34  18 612 0.00130 0.00129
35  18 630 0.00131 0.00130
36  18 648 0.00133 0.00130
37  17 629 0.00126 0.00126
38  17 646 0.00126 0.00124
39  15 585 0.00112 0.00111
43  15 645 0.00113 0.00110
44  14 616 0.00106 0.00105
45  15 675 0.00114 0.00110
46  14 644 0.00106 0.00105
48  14 672 0.00109 0.00105
49  13 637 0.00100 0.00099
58  13 754 0.00104 0.00100
59  12 708 0.00098 0.00096
69  12 828 0.00102 0.00098
70  11 770 0.00093 0.00092
77  11 847 0.00094 0.00091
78  10 780 0.00086 0.00086
79  10 790 0.00087 0.00085

So, for anyone still reading, you can see that Hans is right and Marilyn is close:

  • For longer strings and more than a few appends, join is clearly a win
  • The total number of characters at the crossover isn't quite constant, but it grows slowly.

Based on this experiment I would say that if the total number of characters is less than 500-1000, concatenation is fine. For anything bigger, use join.

Of course the total amount of time involved in any case is pretty small. Unless you have a lot of characters or you are building a lot of strings, I don't think it really matters too much.

I started this experiment because I have been telling people on the Tutor mailing list to use join, and I wondered how much it really mattered. Does it make enough of a difference to bring it up to beginners? I'm not sure. It's good to teach best practices, but maybe it's a poor use of time to teach this to beginners. I won't be so quick to bring it up next time.

Update: Alan Gauld points out that this is an optimization, and the first rule of optimization is don't until you know you need it. That's a useful way to think about it. Thanks for the reminder!

posted at 23:39:44    #    comment []    trackback []
December 2005
    1 2 3 4
5 6 7 8 91011

Comments about life, the universe and Python, from the imagination of Kent S Johnson.

XML-Image Letterimage


© 2005, Kent Johnson