Python Rocks! and other rants 2004/4
Weblog of Kent S Johnson

2004-04-30

Simplicity Rules

One of the qualities that distinguishes code-and-fix hacking from software craftsmanship is a different idea of what "done" means. Read more in this essay.
posted at 23:18:24    #    comment []    trackback []
 

Preaching to the Choir

My two "Why I love Python" articles have been wildly popular. Much of their popularity is due to being mentioned in the Daily Python-URL. But I have to wonder, why do Pythonistas so enjoy reading about why Python is great? And how can I reach the Java programmers where I work and convince them to try Python?

posted at 09:54:08    #    comment []    trackback []
 
2004-04-29

Why I love Python 2

Python makes it very easy to build complex data structures. One place this is handy is with data-driven programming.

For Meccano I wrote a simple walk-by-rule engine. It walks the tree of domain data and applies callbacks at indicated points. The walk is driven from a tree structure that can be quite large and deeply nested. (I have written about the rule engine before.)

Here is a simple example using some of the same techniques. As you read the example, imagine that the list of rules might be hundreds of lines long and deeply nested. Later on I will indicate some other ways the example might be extended.

Assume we are given a dictionary and we are to print its contents in a nested form where the nesting and order of keys in the output is given by the location of dictionary keys in a structure built from nested lists.

The essential idea of the problem is to use a staticly defined nested list to drive the formatting.

Python version

The Python version is short and sweet (14 lines, 349 chars). The data structures are defined easily and the output generation is simple:

data = { 'a':1, 'b':2, 'c':3, 'd':4, 'e':5 }

formatData = [ 'a', [ 'b', [ 'd', 'e' ], 'c' ] ]


def output(format, indent):
  for item in format:
      if type(item) == type([]):
          output(item, indent+2)
      else:
          val = data[item]
          print '%*s%s: %s' % (indent, ' ', item, val )

output(formatData, 0)

The output is:

a: 1
 b: 2
   d: 4
   e: 5
 c: 3

Java version

The Java version is long and ugly. It is 44 lines and 1127 chars - over three times the size of the Python version! The Map is defined in code. The nested list needs extra (Object[]) casts that greatly reduce readability. The code is much more verbose; this is always the case with Java collection code vs Python:

import java.util.HashMap;
import java.util.Map;

public class Structure {

   static Map data = new HashMap();
   
   static {
       data.put("a", new Integer(1));
       data.put("b", new Integer(2));
       data.put("c", new Integer(3));
       data.put("d", new Integer(4));
       data.put("e", new Integer(5));
   }

   static Object[] struct = {
       "a", 
       new Object[]{
           "b", new Object[]{
               "d", "e"
           },
           "c"
       }
   };

   public static void output(Object[] format, int indent) {
       for (int i = 0; i < format.length; i++) {
           Object item = format[i];
           if (item instanceof Object[]) {
               output((Object[])item, indent+2);
           }
           else {
               Integer val = (Integer)data.get(item);
               for (int j=0; j<indent; j++)
                   System.out.print(' ');
               System.out.println(item + ": " + val);
           }
       }
   }
   
   public static void main(String[] argv) {
       output(struct, 0);
   }
}

Reading data from a file

Now suppose you want to put the configuration data in a file that can be changed at runtime and reloaded as needed.

In Python, all you have to do is move the data structure definitions into a separate Python module. Client code imports the data module and reloads it before each use to re-read the source if it has changed.

In Java, typically the configuration data will be put into a text (non-code) format. XML works well for storing hierarchical data so it would be an obvious choice. Now, you have to define an XML format to hold the data and write code to load and parse the data.

So a hidden benefit of Python is that it includes a parser with the runtime. The parser can read text files and create native collections. This is a huge plus for Python!

More Python benefits

Imagine that part of the nested structure is a class or function name. In Python, the configuration module is code so it can define classes and functions that are referenced directly in the data. Or you can import the module that defines the class or function, then put a reference to it in the data.

In Java, you would typically use separate compiled modules to define the classes and reflection to reference them. The use of reflection further complicates the configuration parser or the client code.

What if parts of the data are repeated? In Python, it's no problem! A repeated section of the configuration can be defined separately and included into the main structure by reference. With an XML representation, the data would likely be repeated in multiple locations in the file.

This all works

I'm not just making this up for the sake of argument - these are all techniques I have used in production code. Python data structures are wonderfully flexible and easy to use!

posted at 20:10:40    #    comment []    trackback []
 
2004-04-26

Wired News: Diebold May Face Criminal Charges

If you think development of voting machines is somehow different from other software/hardware projects you should read this article.

Diebold, manufacturer of voting machines used in California, has been decertified by the state and referred to the state attorney general for possible civil and criminal charges under state election law.

The details sound very familiar to projects I have worked on that were struggling to meet a deadline. For example, a new peripheral was installed days before the election that was still being debugged. The peripheral failed in two counties causing the polls to open late. In addition, "Diebold...installed uncertified software on its voting machines in 17 counties without notifying state officials or, in some cases, even county officials who were affected by the changes."

This behavior is appropriate for a trade show, not a federal election. It's no way to run a democracy.

posted at 08:24:32    #    comment []    trackback []
 
2004-04-24

Agile Prophecies of Dr Seuss

Everything I need to know I learned from The Cat in the Hat.

Just read it :-)

posted at 22:27:12    #    comment []    trackback []
 

Don't Repeat Yourself

Don't Repeat Yourself and its special case Once and Only Once are two of the most important principles of good development. Read this essay for more.

posted at 14:18:40    #    comment []    trackback []
 
2004-04-23

Why I love Python

The code I wrote last night to build a Map of Maps shows one reason why I like Python so much - it is so easy to work with collections!

I wrote a sample app that shows the same thing in Java and Python. The requirement is to take a list of triples of strings in the form

[ language code, item id, localized string ]

and build a two level Map from language code => item id => data triple. Both examples include a simple test driver which prints:

This is a test
C'est un essai
no data
Another test
no data

Java version

Here is the Java version abstracted from the code I wrote last night. It is 56 lines and 1797 characters. The functional part of the code (excluding main()) is 38 lines.

import java.util.*;

public class Test {

  private Map _map = new HashMap();
  
  public Test(String[][] data) {
      // Convert the input data to a two-level Map from language code => course ID => locale data
      for (int i = 0; i < data.length; i++) {
              String[] itemData = data[i];
              
          String lang = itemData[0];
          
          Map langMap = (Map)_map.get(lang);
          if (langMap == null) {
              langMap = new HashMap();
              _map.put(lang, langMap);
          }
          
          String id = itemData[1];
          langMap.put(id, itemData);
      }
  }
  
  public String lookup(String lang, String id, String defaultData) {
      Map langMap = (Map)_map.get(lang);
      if (langMap == null) return defaultData;
      
      String[] itemData = (String[])langMap.get(id);
      if (itemData == null) return defaultData;
      
      String title = itemData[2];
      if (title == null || title.length() == 0)
          return defaultData;

      return title;
  }
  
  
  public static void main(String[] args) {
      String[][] data = {
          { "en", "123", "This is a test" },
          { "fr", "123", "C'est un essai" },
          { "es", "123", "" },
          { "en", "345", "Another test" }
      };
      
      Test test = new Test(data);
      
      System.out.println(test.lookup("en", "123", "no data"));
      System.out.println(test.lookup("fr", "123", "no data"));
      System.out.println(test.lookup("es", "123", "no data"));
      System.out.println(test.lookup("en", "345", "no data"));
      System.out.println(test.lookup("fr", "345", "no data"));       
  }
}

Python version

And here is the Python version. It is 34 lines and 1036 characters. The functional part of the code (excluding main) is 17 lines. That is roughly 40% shorter than the Java version.

class Test:
  
  def __init__(self, data):
      # Convert the input data to a two-level Map from language code => course ID => locale data
      self._map = {}
      
      for itemData in data:
          lang, id = itemData[:2]
          self._map.setdefault(lang, {})[id] = itemData


  def lookup(self, lang, id, defaultData):
      itemData = self._map.get(lang, {}).get(id)
      if  not itemData:
          return defaultData
      
      return itemData[2] or defaultData
  
  
if __name__ == '__main__':
  data = [
          [ "en", "123", "This is a test" ],
          [ "fr", "123", "C'est un essai" ],
          [ "es", "123", "" ],
          [ "en", "345", "Another test" ]
      ]
      
  test = Test(data);

  print test.lookup("en", "123", "no data")
  print test.lookup("fr", "123", "no data")
  print test.lookup("es", "123", "no data")
  print test.lookup("en", "345", "no data")
  print test.lookup("fr", "345", "no data")

I know which version I prefer!

So how come I'm using Java?

Sigh. I chickened out.

I work in a predominantly Java shop. The project I am working on could grow to a GUI app or a web app or both. I'm familiar with Java Swing, Jetty web server and servlets. I know that there is a great variety of mature tools available for writing Java web apps. I have support available at work if I need help.

On the Python side, I would have to learn wxPython and/or one of the Python web app frameworks like WebWork or Quixote. I don't get such warm fuzzy feelings about the completeness of these frameworks, both in features and release quality. I would be working on my own and out on a limb if I had any problems.

In the end, I decided it was too great a risk so I went with the safer solution.

Sigh.

posted at 08:30:56    #    comment []    trackback []
 
2004-04-22

What happened to the Python part?

Not much Python here recently, mostly Java. I am learning about J2EE and writing some code that will probably turn into a Java webapp at some point. I'm getting familiar with Eclipse again. This week I made some changes to my Jython app. So I haven't gone completely over to the dark side :-)
posted at 23:07:44    #    comment []    trackback []
 

10x speedup ain't bad!

I am working on some of the worst code I have seen in years. About the best thing I can say for it is, it's mercifully short - 800 lines of Java with eight comments. (The second best thing is that it isn't in VB. The last truly horrible code I worked on was a VB monster with a 4000-line loop and case statement at its heart.)

The main loop is 250 lines. I don't think the author understood recursion; there are four separate stacks that maintain state in the loop! I'm not sure yet but I think they will all be replaced by recursive calls.

The loop builds a 5 MB XML structure in a StringBuffer, then writes it out to a file! Um, why not just write it to a file directly? Well, the StringBuffers are on one of the stacks so I have to sort that out first.

It has wonderfully readable conditional code like this:

if (isNoLocale() == false) ...

And the clincher - the program uses 16,285 localizations. They are keyed by language code and id. So how do you think the program was accessing this data? It put it all in a List and searched sequentially for a match!!! Yikes!

When I first ran the program it took 191 seconds to create the file.

So today's wins were to - Refactor the CSV reader part of the code into a separate class and clean that up, including changing the line items from Maps to Lists. Time to generate the file: 119 sec. - Build a two-level Map so the localizations can be looked up directly instead of by exhaustive search. Time to generate the file: 19 seconds!

And what does this have to do with anything, anyway? Well, I have to rant every once in a while or I will have to rename my blog :-)

And what is the way out of this mess? Refactoring and unit testing, of course! I have three test files containing all the live data used by the program. Every time I make a change I regenerate them and test with XmlUnit. Now I can refactor without fear to get the program to a point where I can understand it.

Poor code structure is a performance issue! Because if you can't understand it, you can't find the bottlenecks. You can't even profile code that is all in one method.

I saw the same thing with the 4000-line VB loop. I started factoring out common code and eventually I could see where the performance problems were and do something about them.

posted at 22:50:40    #    comment []    trackback []
 

J2EE Design and Development, again

I really like this book. It is full of sensible, practical advice based on real experience.

For example Chapter 3 is about testing J2EE applications. The chapter starts with an introduction to Test-driven development, JUnit and best practices for testing in general. Then the author reviews several freely available testing tools including Cactus, ServletUnit, HttpUnit and Web Application Stress Tool.

Each of these tools is presented in the context of testing one aspect of a J2EE application. Short code snippets illustrate the use of the tool.

I have a hard time putting this book down. Really! OK, so I'm a geek. I admit it. And he's not Michael Crichton. But I enjoy reading the author's opinions and advice and I am learning a lot.

posted at 09:24:16    #    comment []    trackback []
 
2004-04-21

Alternative Enterprise Architectures

Here are some thought-provoking articles about enterprise application architecture:

PetShop.NET: An Anti-Pattern Architecture

This one is interesting because it contrasts an (IMO) overweight J2EE architecture (Sun's Java PetStore) with an under-architected solution (Microsoft's PetShop.NET). The author has an extreme bias towards very heavy architecture - there are as many packages in Java PetStore as there are classes in PetShop.NET! I think the PetStore architecture is an example of architecture for the sake of architecture, and the author approves. In his conclusion he says, "DotNetGuru is completely rewriting the PetShop.NET, with the intent of implementing a true N-tier architecture based on an agile design. This means that we will provide an implementation that would let the user choosing his best architecture by using Abstract Factory pattern between all the layers. It will be possible to use Remoting/WebService/Local calls in the service layer and real O/R Mapping tool or DAO in the Data tier just by changing configuration file." Yikes! Agile development anyone?

James Turner writes, Why Do Java Developers Like to Make Things So Hard?. The above article is a great example of what he is talking about.

I actually agree with many of the criticisms in the PetStore article, but I think his cure is as bad as the disease. There is a middle way, agile methods will take you there!

The Way You Develop Web Applications May Be Wrong

In this article Anthony Eden argues for an extremely lightweight style of web application development. The comments debate various alternatives.

Simplifying Web Development: Jython, Spring and Velocity

Two of my favorites plus a new interest...what's not to like? :-) There is a link to a good presentation about Spring.

posted at 08:05:20    #    comment []    trackback []
 
2004-04-20

GraphViz

I've just discovered the GraphViz package from AT&T Research. I have heard of the package before. It was a release of pydot, a Python wrapper, that got me to look at it.

GraphViz makes it astonishingly simple to create nice-looking graphics. For example, to make a package dependency graph of the Java packages in Meccano, I created this graph definition:

digraph G {
  size = "3,3";
  node [shape=box,fontname="Helvetica"];
  blasterui -> blaster;
  blasterui -> swing;
  blaster -> converter;
  blaster -> writer;
  writer -> mockcourse;
  devscript -> coursedata;
  devscript -> word;
  converter -> devscript;
  converter -> mockcourse;
  mockcourse -> coursedata;
  coursedata -> util;
  server -> editor;
  server -> writer;
  editor -> util;
}

I saved the definition in a file called dep.data. Next I ran dot from the command line with

dot -Tpng dep.data > dep.png

Here is the resulting image:

There is much, much more you can do with this package but this shows how easy it is to get a useful result.

pydot puts a Python wrapper around GraphViz. This could be useful for creating graphs from a program. For hand-created graphs it might be easier just to write the data file by hand as I did for the example.

I had to make a few changes to pydot.py to get it to work under Windows. There were problems with the PATH separator, the name of the .exe files, and text vs binary files. Here is a diff from version 0.9 that shows the changes I made:

Compare: (<)C:\Downloads\Python\pydot-0.9\pydot.py (26989 bytes)
   with: (>)C:\Python23\Lib\site-packages\pydot.py (27121 bytes)

117c117
<     for path in os.environ['PATH'].split(':'):
---
>     for path in os.environ['PATH'].split(os.pathsep):
121a121,122
>             elif os.path.exists(path+os.path.sep+prg + '.exe'):
>                 progs[prg]=path+os.path.sep+prg + '.exe'
727c729
<         dot_fd=open(path, "w+")
---
>         dot_fd=open(path, "w+b")
763c765
<         out=os.popen(self.progs[prog]+' -T'+format+' '+tmp_name, 'r')
---
>         out=os.popen(self.progs[prog]+' -T'+format+' '+tmp_name, 'rb')
posted at 10:28:16    #    comment []    trackback []
 
2004-04-19

Continuous Design

I have written before about Growing a design. Key to the success of this technique is keeping your code clean using principles such as Don't Repeat Yourself and You Aren't Going to Need It.

In this article, Jim Shore chronicles his experience with this process. I particularly like the sidebar Design Goals in Continuous Design which summarizes much of what makes this technique work.

posted at 08:41:36    #    comment []    trackback []
 

Inversion of Control Frameworks

Whenever you hide an implementation class behind an interface, you have the problem of instantiating the concrete instances of the interface and giving them to the client code.

There are several ways to do this:

  • The client code can instatiate the instance directly
  • The instance can be stored in a global resource such as a singleton or registry
  • The code that instantiates the client can also create the instance and pass it to the client
  • An instance can be created from a global property using reflection

Each of these techniques has disadvantages:

  • If the client creates the instance then you can't substitute a different implementation without changing the client, and the benefit of using the interface is reduced.
  • Using a global registry, singleton or property makes the client depend on the global facility which makes testing and reuse more difficult.
  • Reflection is complicated when the instance had dependencies of its own, for example it needs configuration data or depends on other interfaces.

A solution to this problem that is gaining popularity is to use a framework with support for Inversion of Control (or, as Martin Fowler calls it, Dependency Injection). With this technique, client code can be written with no dependencies on global resources. The framework takes care of initializing the required instances and providing them on demand.

Martin Fowler has an article that explains the technique. Two frameworks that use this technique are Spring and PicoContainer.

posted at 08:28:48    #    comment []    trackback []
 
2004-04-14

XPath and dom4j

One of my favorite features of dom4j is its integrated XPath support. This essay has details.
posted at 20:42:40    #    comment []    trackback []
April 2004
MoTuWeThFrSaSu
    1 2 3 4
5 6 7 8 91011
12131415161718
19202122232425
2627282930  
Mar
2004
 Jun
2004

Comments about life, the universe and Python, from the imagination of Kent S Johnson.

XML-Image Letterimage

BlogRoll

© 2004-2005, Kent Johnson