Python Rocks! and other rants

2004-04-30

Preaching to the Choir

My two "Why I love Python" articles have been wildly popular. Much of their popularity is due to being mentioned in the Daily Python-URL. But I have to wonder, why do Pythonistas so enjoy reading about why Python is great? And how can I reach the Java programmers where I work and convince them to try Python?

posted at 09:54:08 # comment [] trackback []

2004-04-29

Why I love Python 2

Python makes it very easy to build complex data structures. One place this is handy is with data-driven programming.

For Meccano I wrote a simple walk-by-rule engine. It walks the tree of domain data and applies callbacks at indicated points. The walk is driven from a tree structure that can be quite large and deeply nested. (I have written about the rule engine before.)

Here is a simple example using some of the same techniques. As you read the example, imagine that the list of rules might be hundreds of lines long and deeply nested. Later on I will indicate some other ways the example might be extended.

Assume we are given a dictionary and we are to print its contents in a nested form where the nesting and order of keys in the output is given by the location of dictionary keys in a structure built from nested lists.

The essential idea of the problem is to use a staticly defined nested list to drive the formatting.

Python version

The Python version is short and sweet (14 lines, 349 chars). The data structures are defined easily and the output generation is simple:

data = { 'a':1, 'b':2, 'c':3, 'd':4, 'e':5 }

formatData = [ 'a', [ 'b', [ 'd', 'e' ], 'c' ] ]


def output(format, indent):
  for item in format:
      if type(item) == type([]):
          output(item, indent+2)
      else:
          val = data[item]
          print '%*s%s: %s' % (indent, ' ', item, val )

output(formatData, 0)

The output is:

a: 1
 b: 2
   d: 4
   e: 5
 c: 3

Java version

The Java version is long and ugly. It is 44 lines and 1127 chars - over three times the size of the Python version! The Map is defined in code. The nested list needs extra (Object[]) casts that greatly reduce readability. The code is much more verbose; this is always the case with Java collection code vs Python:

import java.util.HashMap;
import java.util.Map;

public class Structure {

   static Map data = new HashMap();
   
   static {
       data.put("a", new Integer(1));
       data.put("b", new Integer(2));
       data.put("c", new Integer(3));
       data.put("d", new Integer(4));
       data.put("e", new Integer(5));
   }

   static Object[] struct = {
       "a", 
       new Object[]{
           "b", new Object[]{
               "d", "e"
           },
           "c"
       }
   };

   public static void output(Object[] format, int indent) {
       for (int i = 0; i < format.length; i++) {
           Object item = format[i];
           if (item instanceof Object[]) {
               output((Object[])item, indent+2);
           }
           else {
               Integer val = (Integer)data.get(item);
               for (int j=0; j<indent; j++)
                   System.out.print(' ');
               System.out.println(item + ": " + val);
           }
       }
   }
   
   public static void main(String[] argv) {
       output(struct, 0);
   }
}

Reading data from a file

Now suppose you want to put the configuration data in a file that can be changed at runtime and reloaded as needed.

In Python, all you have to do is move the data structure definitions into a separate Python module. Client code imports the data module and reloads it before each use to re-read the source if it has changed.

In Java, typically the configuration data will be put into a text (non-code) format. XML works well for storing hierarchical data so it would be an obvious choice. Now, you have to define an XML format to hold the data and write code to load and parse the data.

So a hidden benefit of Python is that it includes a parser with the runtime. The parser can read text files and create native collections. This is a huge plus for Python!

More Python benefits

Imagine that part of the nested structure is a class or function name. In Python, the configuration module is code so it can define classes and functions that are referenced directly in the data. Or you can import the module that defines the class or function, then put a reference to it in the data.

In Java, you would typically use separate compiled modules to define the classes and reflection to reference them. The use of reflection further complicates the configuration parser or the client code.

What if parts of the data are repeated? In Python, it's no problem! A repeated section of the configuration can be defined separately and included into the main structure by reference. With an XML representation, the data would likely be repeated in multiple locations in the file.

This all works

I'm not just making this up for the sake of argument - these are all techniques I have used in production code. Python data structures are wonderfully flexible and easy to use!

posted at 20:10:40 # comment [] trackback []

2004-04-23

Why I love Python

The code I wrote last night to build a Map of Maps shows one reason why I like Python so much - it is so easy to work with collections!

I wrote a sample app that shows the same thing in Java and Python. The requirement is to take a list of triples of strings in the form

[ language code, item id, localized string ]

and build a two level Map from language code => item id => data triple. Both examples include a simple test driver which prints:

This is a test
C'est un essai
no data
Another test
no data

Java version

Here is the Java version abstracted from the code I wrote last night. It is 56 lines and 1797 characters. The functional part of the code (excluding main()) is 38 lines.

import java.util.*;

public class Test {

  private Map _map = new HashMap();
  
  public Test(String[][] data) {
      // Convert the input data to a two-level Map from language code => course ID => locale data
      for (int i = 0; i < data.length; i++) {
              String[] itemData = data[i];
              
          String lang = itemData[0];
          
          Map langMap = (Map)_map.get(lang);
          if (langMap == null) {
              langMap = new HashMap();
              _map.put(lang, langMap);
          }
          
          String id = itemData[1];
          langMap.put(id, itemData);
      }
  }
  
  public String lookup(String lang, String id, String defaultData) {
      Map langMap = (Map)_map.get(lang);
      if (langMap == null) return defaultData;
      
      String[] itemData = (String[])langMap.get(id);
      if (itemData == null) return defaultData;
      
      String title = itemData[2];
      if (title == null || title.length() == 0)
          return defaultData;

      return title;
  }
  
  
  public static void main(String[] args) {
      String[][] data = {
          { "en", "123", "This is a test" },
          { "fr", "123", "C'est un essai" },
          { "es", "123", "" },
          { "en", "345", "Another test" }
      };
      
      Test test = new Test(data);
      
      System.out.println(test.lookup("en", "123", "no data"));
      System.out.println(test.lookup("fr", "123", "no data"));
      System.out.println(test.lookup("es", "123", "no data"));
      System.out.println(test.lookup("en", "345", "no data"));
      System.out.println(test.lookup("fr", "345", "no data"));       
  }
}

Python version

And here is the Python version. It is 34 lines and 1036 characters. The functional part of the code (excluding main) is 17 lines. That is roughly 40% shorter than the Java version.

class Test:
  
  def __init__(self, data):
      # Convert the input data to a two-level Map from language code => course ID => locale data
      self._map = {}
      
      for itemData in data:
          lang, id = itemData[:2]
          self._map.setdefault(lang, {})[id] = itemData


  def lookup(self, lang, id, defaultData):
      itemData = self._map.get(lang, {}).get(id)
      if  not itemData:
          return defaultData
      
      return itemData[2] or defaultData
  
  
if __name__ == '__main__':
  data = [
          [ "en", "123", "This is a test" ],
          [ "fr", "123", "C'est un essai" ],
          [ "es", "123", "" ],
          [ "en", "345", "Another test" ]
      ]
      
  test = Test(data);

  print test.lookup("en", "123", "no data")
  print test.lookup("fr", "123", "no data")
  print test.lookup("es", "123", "no data")
  print test.lookup("en", "345", "no data")
  print test.lookup("fr", "345", "no data")

I know which version I prefer!

So how come I'm using Java?

Sigh. I chickened out.

I work in a predominantly Java shop. The project I am working on could grow to a GUI app or a web app or both. I'm familiar with Java Swing, Jetty web server and servlets. I know that there is a great variety of mature tools available for writing Java web apps. I have support available at work if I need help.

On the Python side, I would have to learn wxPython and/or one of the Python web app frameworks like WebWork or Quixote. I don't get such warm fuzzy feelings about the completeness of these frameworks, both in features and release quality. I would be working on my own and out on a limb if I had any problems.

In the end, I decided it was too great a risk so I went with the safer solution.

Sigh.

posted at 08:30:56 # comment [] trackback []

2004-04-22

10x speedup ain't bad!

I am working on some of the worst code I have seen in years. About the best thing I can say for it is, it's mercifully short - 800 lines of Java with eight comments. (The second best thing is that it isn't in VB. The last truly horrible code I worked on was a VB monster with a 4000-line loop and case statement at its heart.)

The main loop is 250 lines. I don't think the author understood recursion; there are four separate stacks that maintain state in the loop! I'm not sure yet but I think they will all be replaced by recursive calls.

The loop builds a 5 MB XML structure in a StringBuffer, then writes it out to a file! Um, why not just write it to a file directly? Well, the StringBuffers are on one of the stacks so I have to sort that out first.

It has wonderfully readable conditional code like this:

if (isNoLocale() == false) ...

And the clincher - the program uses 16,285 localizations. They are keyed by language code and id. So how do you think the program was accessing this data? It put it all in a List and searched sequentially for a match!!! Yikes!

When I first ran the program it took 191 seconds to create the file.

So today's wins were to - Refactor the CSV reader part of the code into a separate class and clean that up, including changing the line items from Maps to Lists. Time to generate the file: 119 sec. - Build a two-level Map so the localizations can be looked up directly instead of by exhaustive search. Time to generate the file: 19 seconds!

And what does this have to do with anything, anyway? Well, I have to rant every once in a while or I will have to rename my blog :-)

And what is the way out of this mess? Refactoring and unit testing, of course! I have three test files containing all the live data used by the program. Every time I make a change I regenerate them and test with XmlUnit. Now I can refactor without fear to get the program to a point where I can understand it.

Poor code structure is a performance issue! Because if you can't understand it, you can't find the bottlenecks. You can't even profile code that is all in one method.

I saw the same thing with the 4000-line VB loop. I started factoring out common code and eventually I could see where the performance problems were and do something about them.

posted at 22:50:40 # comment [] trackback []

J2EE Design and Development, again

I really like this book. It is full of sensible, practical advice based on real experience.

For example Chapter 3 is about testing J2EE applications. The chapter starts with an introduction to Test-driven development, JUnit and best practices for testing in general. Then the author reviews several freely available testing tools including Cactus, ServletUnit, HttpUnit and Web Application Stress Tool.

Each of these tools is presented in the context of testing one aspect of a J2EE application. Short code snippets illustrate the use of the tool.

I have a hard time putting this book down. Really! OK, so I'm a geek. I admit it. And he's not Michael Crichton. But I enjoy reading the author's opinions and advice and I am learning a lot.

posted at 09:24:16 # comment [] trackback []

2004-04-21

Alternative Enterprise Architectures

Here are some thought-provoking articles about enterprise application architecture:

PetShop.NET: An Anti-Pattern Architecture

This one is interesting because it contrasts an (IMO) overweight J2EE architecture (Sun's Java PetStore) with an under-architected solution (Microsoft's PetShop.NET). The author has an extreme bias towards very heavy architecture - there are as many packages in Java PetStore as there are classes in PetShop.NET! I think the PetStore architecture is an example of architecture for the sake of architecture, and the author approves. In his conclusion he says, "DotNetGuru is completely rewriting the PetShop.NET, with the intent of implementing a true N-tier architecture based on an agile design. This means that we will provide an implementation that would let the user choosing his best architecture by using Abstract Factory pattern between all the layers. It will be possible to use Remoting/WebService/Local calls in the service layer and real O/R Mapping tool or DAO in the Data tier just by changing configuration file." Yikes! Agile development anyone?

James Turner writes, Why Do Java Developers Like to Make Things So Hard?. The above article is a great example of what he is talking about.

I actually agree with many of the criticisms in the PetStore article, but I think his cure is as bad as the disease. There is a middle way, agile methods will take you there!

The Way You Develop Web Applications May Be Wrong

In this article Anthony Eden argues for an extremely lightweight style of web application development. The comments debate various alternatives.

Simplifying Web Development: Jython, Spring and Velocity

Two of my favorites plus a new interest...what's not to like? :-) There is a link to a good presentation about Spring.

posted at 08:05:20 # comment [] trackback []

2004-04-14

XPath and dom4j

One of my favorite features of dom4j is its integrated XPath support. This essay has details.

posted at 20:42:40 # comment [] trackback []

2004-04-01

Code faster with Python

Bruce Eckel weighs in with his estimate that he codes 5-10 times faster in Python than in Java. A few quotes:

"I have regular experience in both languages, and the result is always: if you want to get a lot done, use Python."

While learning Python, "the experience of being dramatically more productive in Python repeats itself over and over."

This matches my experience exactly. You just don't get it until you try it, then you don't want to go back.

posted at 16:52:00 # comment [] trackback []

Everything Java

Python Rocks! and other rants - Java 2004/4Weblog of Kent S Johnson

Preaching to the Choir

Why I love Python 2

Why I love Python

10x speedup ain't bad!

Alternative Enterprise Architectures

Python Rocks! and other rants - Java 2004/4
Weblog of Kent S Johnson