Python Rocks! and other rants 2004/8
Weblog of Kent S Johnson

2004-08-27

concat vs join - followup

A couple of people have made good points about my last post comparing string concatenation and join.

Marilyn Davis pointed out that in my data, the crossover point where join beats concatenation is always around 500 total characters in the final string. Hans Nowak pointed out that for much longer strings such the lines of a file or parts of a web page, the crossover point comes very quickly.

So here is ConcatTimer version 2 :-) This version dispenses with the fancy graphics and just looks for the crossover point. (It's not too smart about it, either.) It also looks at much larger text chunks - up to 80 characters. Here is the program:

import timeit

reps = 100 # How many reps to try?
unit = '    ' # Concat this string

# Naive concatenation using string +
def concatPlus(count):
   
s=''
   
for i in range(count):
       
s += unit
   
return s

# Concatention with string.join
def concatJoin(count):
   
s=[]
   
for i in range(count):
       
s.append(unit)
   
return ''.join(s)


# Time one test case
def timeOne(fn, count):
   
setup = "from __main__ import " + fn.__name__
   
stmt = '%s(%d)' % (fn.__name__, count)
   
   
t = timeit.Timer(stmt, setup)
   
secs = min(t.repeat(3, reps))
   
return secs


# For strings of length unitLen, find the crossover point where appending
# takes the same amount of time as joining
def findOne(unitLen):
   
global unit
   
unit = ' ' * unitLen
   
t = 2
   
   
while 1:
       
tPlus = timeOne(concatPlus, t)
       
tJoin = timeOne(concatJoin, t)
       
if tPlus > tJoin:
           
break
       
t += 1
       
   
return t, tPlus, tJoin
   

for unitLen in range(1,80):
   
t, tPlus, tJoin = findOne(unitLen)
   
print '%2d %3d %3d %1.5f %1.5f' % (unitLen, t, t*unitLen, tPlus, tJoin)

And here is an elided list of results. The columns are the length of the pieces, the number of pieces where concat becomes more expensive than join, the total number of characters in the string at the crossover point, and the actual times. (I cut the number of reps down to keep this from taking too long to run.)

 1 475 475 0.02733 0.02732
 2 263 526 0.01581 0.01581
 3 169 507 0.01024 0.01022
 4 129 516 0.00782 0.00778
 5 100 500 0.00622 0.00604
 6  85 510 0.00517 0.00515
 7  73 511 0.00447 0.00446
 8  63 504 0.00386 0.00385
 9  57 513 0.00354 0.00353
10  53 530 0.00333 0.00333
11  47 517 0.00294 0.00292
12  45 540 0.00287 0.00285
13  41 533 0.00262 0.00260
14  38 532 0.00246 0.00244
15  36 540 0.00232 0.00230
16  34 544 0.00222 0.00222
17  31 527 0.00200 0.00199
18  29 522 0.00189 0.00188
19  30 570 0.00199 0.00194
20  28 560 0.00188 0.00186
21  28 588 0.00190 0.00185
22  26 572 0.00177 0.00174
23  25 575 0.00170 0.00168
24  24 576 0.00165 0.00163
25  23 575 0.00158 0.00156
26  22 572 0.00153 0.00151
27  21 567 0.00146 0.00144
28  21 588 0.00146 0.00146
29  21 609 0.00147 0.00144
30  20 600 0.00142 0.00139
31  19 589 0.00134 0.00134
32  20 640 0.00143 0.00139
33  19 627 0.00137 0.00136
34  18 612 0.00130 0.00129
35  18 630 0.00131 0.00130
36  18 648 0.00133 0.00130
37  17 629 0.00126 0.00126
38  17 646 0.00126 0.00124
39  15 585 0.00112 0.00111
43  15 645 0.00113 0.00110
44  14 616 0.00106 0.00105
45  15 675 0.00114 0.00110
46  14 644 0.00106 0.00105
48  14 672 0.00109 0.00105
49  13 637 0.00100 0.00099
58  13 754 0.00104 0.00100
59  12 708 0.00098 0.00096
69  12 828 0.00102 0.00098
70  11 770 0.00093 0.00092
77  11 847 0.00094 0.00091
78  10 780 0.00086 0.00086
79  10 790 0.00087 0.00085

So, for anyone still reading, you can see that Hans is right and Marilyn is close:

  • For longer strings and more than a few appends, join is clearly a win
  • The total number of characters at the crossover isn't quite constant, but it grows slowly.

Based on this experiment I would say that if the total number of characters is less than 500-1000, concatenation is fine. For anything bigger, use join.

Of course the total amount of time involved in any case is pretty small. Unless you have a lot of characters or you are building a lot of strings, I don't think it really matters too much.


I started this experiment because I have been telling people on the Tutor mailing list to use join, and I wondered how much it really mattered. Does it make enough of a difference to bring it up to beginners? I'm not sure. It's good to teach best practices, but maybe it's a poor use of time to teach this to beginners. I won't be so quick to bring it up next time.

Update: Alan Gauld points out that this is an optimization, and the first rule of optimization is don't until you know you need it. That's a useful way to think about it. Thanks for the reminder!

posted at 23:39:44    #    comment []    trackback []
 

Which is really faster - concatenation or join?

A couple of times recently I have given out the conventional advice that for concatenating strings, it is better to build a list of the pieces and join it together than to use string concatenation to build the list. The reasoning is that string concatenation requires copying the entire string for each addition, while the list is designed to make concatenation efficient.

Because of all the copying, the time for string concatenation is proportional to the square of the number of additions - it is O(n^2). List append, however, happens in constant time, so building the list takes time proportional to the number of appends - it is O(n).

The trick with this, though, is that there is a proportionality constant here, and for small n, string concatenation may be faster. I decided to find out.

Here is a program that compares the time for string concatenation using the two methods. It varies both the number of append operations and the length of the appended strings. It prints the results in a series of tables, and it uses VPython to graph the results:

import timeit
from visual.graph import *

reps = 1000 # How many reps to try?
unit = '    ' # Concat this string

# Naive concatenation using string +
def concatPlus(count):
   
s=''
   
for i in range(count):
       
s += unit
   
return s

# Concatention the way the big boys do it, with string.join
def concatJoin(count):
   
s=[]
   
for i in range(count):
       
s.append(unit)
   
return ''.join(s)


# Time one test case
def timeOne(fn, count):
   
setup = "from __main__ import " + fn.__name__
   
stmt = '%s(%d)' % (fn.__name__, count)
   
   
t = timeit.Timer(stmt, setup)
   
secs = min(t.repeat(3, reps))
   
return secs


# Draw the curves for a single length of the appended string
def graphOne(unitLen):
   
global unit
   
unit = ' ' * unitLen
   
   
title = 'Unit length is %d' % len(unit)
   
funct1 = gcurve(color=color.cyan)
   
funct2 = gdots(color=color.yellow)

   
print
   
print title
   
print '       tPlus  tJoin'

   
for t in range(10, 100, 10) + range(100, 600, 50):
       
tPlus = timeOne(concatPlus, t)
       
tJoin = timeOne(concatJoin, t)
       
print '%5d  %2.3f  %2.3f' % (t, tPlus, tJoin)

       
funct1.plot( pos=(t, tPlus) )
       
funct2.plot( pos=(t, tJoin) )


graph = gdisplay(title='Append speed', xtitle='count', ytitle='time')
for unitLen in [1,2,3,4,5]:
   
graphOne(unitLen)

Here is the graph - the yellow dots are for concatJoin, the blue curves are concatPlus:

http://personalpages.tds.net/~kent37/Python/AppendTimes.png

A couple of things stand out from this:

  • For every string length, concatPlus is faster when the number of appends is relatively small - up to 80 appends in my tests
  • For larger numbers of appends, not only does concatPlus show O(n^2) behavior, it gets worse as the size of the appended strings grows. concatJoin is O(n) and it doesn't really matter how long the appended string is. In fact, I think concatPlus is O(m*n) where m is the total length of the final string and n is the number of appends.

Based on these results, I think I will stop spreading the conventional wisdom. I think most uses of string concatenation are for small strings with a small number of concatenations. String join only pays off when there are a lot of appends.

Here is the raw data from the program:

D:\Personal\Tutor>python concattimer.py
Visual-2003-10-05

Unit length is 1
      tPlus  tJoin
  10  0.005  0.008
  20  0.008  0.014
  30  0.012  0.020
  40  0.015  0.025
  50  0.018  0.032
  60  0.022  0.037
  70  0.026  0.044
  80  0.029  0.049
  90  0.033  0.057
 100  0.038  0.062
 150  0.059  0.092
 200  0.082  0.122
 250  0.109  0.149
 300  0.145  0.178
 350  0.184  0.208
 400  0.222  0.237
 450  0.262  0.264
 500  0.307  0.292
 550  0.349  0.325

Unit length is 2
      tPlus  tJoin
  10  0.005  0.008
  20  0.008  0.014
  30  0.012  0.019
  40  0.015  0.026
  50  0.019  0.033
  60  0.023  0.039
  70  0.027  0.044
  80  0.033  0.051
  90  0.038  0.057
 100  0.042  0.062
 150  0.075  0.095
 200  0.114  0.125
 250  0.155  0.154
 300  0.200  0.185
 350  0.247  0.215
 400  0.295  0.243
 450  0.349  0.272
 500  0.404  0.305
 550  0.463  0.332

Unit length is 3
      tPlus  tJoin
  10  0.005  0.008
  20  0.008  0.014
  30  0.012  0.019
  40  0.016  0.026
  50  0.020  0.033
  60  0.025  0.039
  70  0.031  0.045
  80  0.036  0.051
  90  0.043  0.057
 100  0.050  0.064
 150  0.090  0.095
 200  0.134  0.127
 250  0.184  0.159
 300  0.236  0.187
 350  0.293  0.217
 400  0.360  0.247
 450  0.427  0.278
 500  0.499  0.306
 550  0.576  0.335

Unit length is 4
      tPlus  tJoin
  10  0.005  0.008
  20  0.008  0.014
  30  0.012  0.019
  40  0.017  0.025
  50  0.021  0.033
  60  0.026  0.039
  70  0.035  0.045
  80  0.042  0.051
  90  0.050  0.058
 100  0.057  0.063
 150  0.101  0.096
 200  0.153  0.129
 250  0.207  0.161
 300  0.271  0.192
 350  0.341  0.222
 400  0.416  0.249
 450  0.501  0.280
 500  0.580  0.310
 550  0.682  0.338

Unit length is 5
      tPlus  tJoin
  10  0.005  0.008
  20  0.008  0.014
  30  0.013  0.019
  40  0.017  0.025
  50  0.022  0.034
  60  0.029  0.040
  70  0.040  0.046
  80  0.047  0.052
  90  0.055  0.058
 100  0.063  0.064
 150  0.114  0.097
 200  0.169  0.130
 250  0.232  0.163
 300  0.306  0.195
 350  0.385  0.223
 400  0.470  0.252
 450  0.567  0.283
 500  0.668  0.313
 550  0.779  0.344
posted at 19:49:20    #    comment []    trackback []
 
2004-08-26

Jython and Spring Framework

I am trying out Spring Framework in a current project and I like it. I especially like the support for Hibernate transactions. The only reservation I have is that Spring won't work with Jython classes.

I have made a start at implementing Jython support. I have classes that allow a Jython bean to be instantiated from the IoC framework. Setting properties on the beans is awkward, the bean has to implement an interface that defines the setter methods. But it's a start!

One of the limitations of Jython is that it doesn't play very well with Java introspection. If you want your Jython methods to be visible to Java introspection you have two choices:

  • compile your scripts with jythonc
  • implement a Java interface containing the methods of interest

The first option is problematic (I have had too much trouble with jythonc). To work with Spring setter injection, the second requires that every setter is declared in an interface; not really practical. Since the Spring Inversion of Control framework is built on top of an introspection engine, this is a problem for using it with Jython :-( Rod Johnson (Spring author) says this may be fixable in the framework.

You can learn more from this thread on the Spring support forum: http://forum.springframework.org/viewtopic.php?p=1643#1643

posted at 20:32:00    #    comment []    trackback []
 
2004-08-20

Introduction to Python course

I am pleased to announce that I will be teaching the course "Introduction to Programming with Python" in the fall. The course is offered through Merrimack, NH, US Adult Education. It is open to anyone with some computer experience and an interest in learning to program. I won't be assuming any prior programming experience.

The course will meet on Tuesdays from 7-9pm starting on September 28 at Merrimack High School. It will run for 10 weeks. The cost is $110.

The textbook will be Python Programming for the absolute beginner.

To register, browse to http://www.merrimack.k12.nh.us/ and click on Adult Education.

posted at 08:54:24    #    comment []    trackback []
 
2004-08-07

You know you're a geek when...

the book that is keeping you up at night has EJB in the title :-)

I am reading Rod Johnson's new book, J2EE Development without EJB and I have trouble putting it down. He talks about alternative solutions to many of the issues that arise in web application development such as persistence, transaction management and UI layer MVC architecture. He gives many examples of the use of tools such as Hibernate and JDO. Through it all he shows how Spring Framework integrates with other solutions and generally makes your life easier.

Somehow this all keeps me on the edge of my seat. I can't wait to see what alternatives he proposes for transaction management, or how Spring supports AOP. I'm definitely a geek.

This book is a great way to learn about Spring. It is much more practically focused than Johnson's previous book. And by the way, it is half price at bookpool at the moment (as are all Wiley and Wrox titles)!

posted at 14:46:24    #    comment []    trackback []
 

Hibernate

I have been trying out the Hibernate persistence framework for my current work project, Curriculum Builder. So far I am very impressed with it.

Hibernate is a transparent object/relational mapping framework. It makes it very easy to map your objects to a relational database.

"Transparent" means that your business objects need little or no change to work with Hibernate. Persistent objects are POJOs - they don't have to extend a Hibernate base class or implement a Hibernate interface.

The mapping between your objects and the database is defined in an xml configuration file. The mapping is very flexible. For example your business object can have a collection of associated objects that map through a foreign key in the database.

Several features of Hibernate made me think it was worth trying it out for Curriculum Builder.

Hibernate transparently supports the unit-of-work pattern. The way this works is, you access your persistent objects in the scope of a Hibernate Session. Any changes you make to the persistent objects will be detected automatically by Hibernate and persisted when the session is committed.

Hibernate supports optimistic locking with object versioning. Some form of optimistic locking will be essential in CB and I'm very happy not to write it myself!

Hibernate supports lazy loading, so you can control when dependent objects are loaded. For example a Learning Path object might contain a list of courses. With lazy loading, the actual course objects are not loaded unless the collection is accessed.

For example, to create a new Course object and persist it to the database, I use this code. The session has to be told to save the course:

Session s = openSession();
Transaction tx = s.beginTransaction();
Course course = new Course("ABC0101", "Test course", "enUS", "_ss_bs", false);
s.save(course);
tx.commit();
s.close();

To load a course and change it, I do this. Note that I don't have to tell Hibernate that the course has changed, it figures that out by itself:

s = openSession();
tx = s.beginTransaction();
course = findOneCourse(s, "ABC0101");
course.setTitle("Test course new title");
tx.commit();
s.close();

findOneCourse() uses Hibernate's query mechanism:

private Course findOneCourse(Session s, String code) throws HibernateException {
    
Query query = s.createQuery("from Course where code= :code");
    
query.setString("code", code);
    
Course course = (Course)query.uniqueResult();
    
return course;
}

Here is a test case that checks optimistic locking. It creates an object, then modifies it in two concurrent sessions. When the second session commits it gets a StaleObjectStateException:

public void testLocking() throws Exception {
    
Session s = openSession();
    
Transaction tx = s.beginTransaction();
    
Course course = new Course("ABC0103", "Test course 3", "enUS", "_ss_bs", false);
    
s.save(course);
    
tx.commit();
    
s.close();
    
    
Session s1=null, s2=null;
    
try {
        
// Change the course in two different sessions
        
s1 = openSession();
        
Transaction tx1 = s1.beginTransaction();
        
Course course1 = findOneCourse(s1, "ABC0103");
        
assertEquals("Test course 3", course1.getTitle());
        
        
s2 = openSession();
        
Transaction tx2 = s2.beginTransaction();
        
Course course2 = findOneCourse(s2, "ABC0103");
        
assertEquals("Test course 3", course2.getTitle());
        
        
course1.setTitle("Test course three");
        
tx1.commit();
        
s1.close();
        
        
try {
            
course2.setTitle("Conflict!");
            
tx2.commit();
            
s2.close();
            
fail("Should throw StaleObjectStateException");
        
} catch (StaleObjectStateException e) { }
        
    
} finally {
        
if (s1 != null) s1.close();
        
if (s2 != null) s2.close();
    
}
}

For more information, visit the Hibernate web site: http://www.hibernate.org/

Take a look at the road map: http://www.hibernate.org/152.html

Hibernate: A Developers Notebook is a good place to get started.

posted at 14:29:20    #    comment []    trackback []
August 2004
MoTuWeThFrSaSu
       1
2 3 4 5 6 7 8
9101112131415
16171819202122
23242526272829
3031     
Jun
2004
 Sep
2004

Comments about life, the universe and Python, from the imagination of Kent S Johnson.

XML-Image Letterimage

BlogRoll

© 2004, Kent Johnson