Wednesday, July 28, 2010

Response to blog post: "Threading: Perl vs Python"

This blog post is in response to kbenson's blog post "Threading: Perl vs Python". This started as a comment and predictibly grew out of control.

The benchmark above clearly show the superiority of perl's threading model against python's threading model. In kbenson's benchmarks perl uses a true concurrent threading model and python, the poor dear, does not. When we crunch through several million calculations perl emerges victorious! Huzzah! Sadly it is all an act, created to evince happy warm fuzzy tummy bunnies for JAPHs everywhere. Benchmarks created as a means to an end do very little to impress me. In general, benchmarks are not worth nearly as much emotion as people invest in them. But... they are just so fun to make, I can't help myself!

What follows is a critique of kbenson's benchmarks, with the tiniest pinch of explanation on perl's threads and whats wrong and right with them. I also compare perl threads to the Coro module a little. References to "you" are meant to directly address kbenson in regards to the above blog post.

Benchmarking is no easy task and I appreciate your attempt but there are some problems with your benchmark. My github repo contains my attempt at fixing these problems which I address below.

Data Bound vs CPU Bound

Your benchmark is purely CPU bound. It may very well be a matter of opinion to consider this a flaw. However I think you are misleading and missing the point of using threads as a general concept. Threads that don't share any data are really not very different from processes. There is nothing exciting about the ubiquitous process. Every program creates a process. Threads, whether they be kernel-based (pthreads, Win32 threads) or user-based (coroutines), have the appeal of shared data. Benchmarking threads without benchmarking shared data is downright paradoxical.

Unless explicitly forced to by using various IPC techniques like sockets, files, or shared memory, processes don't share any data. You have in a sense created processes out of threads. This is not terribly shocking because Perl interpreter threads are kernel threads which are made to act like processes. Data is shared explicitly using the threads::shared module and not implicitly as in kernel threads.

Because of this, and in the hopes to avoid the term thread altogether, I have lumped perl ithreads together as processes so that they will coincide with the python alternative. Comparing perl threads to python's multiprocessing module is much more interesting. This is the python equivalent of perl's threads. (I have used python's multiprocessing module in my own benchmarks)

Along the same lines, your limited data sharing in the perl example is confusing. The shared @thread_times variable on line 6 is completely unnecessary. You can retrieve the thread runtime as the return value when you join() the thread, making the @thread_times redundant. Your code does this on line 30 but never uses the result. (I have highlighted the line here)

More interesting and less predictable would have been to benchmark some data sharing between threads, even a trivial amount. I have done exactly that in my repository, using a simple counter. Real world threaded perl applications need more sophisticated data sharing, often in the form of semaphores and/or locking.

Your timing is slightly off because you use time() in the perl example and clock() in the python example. time() and clock() are not interchangeable. Unfortunately we cannot simply start using clock() in the perl example as well. Perl's clock() and python's clock() are not interchangeable, either.

More Benchmarks

In my github repo I have tidied up your original benchmarks and created new ones, modeled from the old. The files are named after the asynchronous model used: "processes" or "coroutines". These benchmarks also spawned variants that share data. A text file containing output of each benchmark is also provided. You can draw your own conclusions after reading the output.txt file, or running the benchmarks yourself with the run.pl script. Readers will find these benchmarks are not created with the vested interest of the outcome of one contestant in mind, but instead in the interest of fairness and curiousity.

The Coro module is still the most attractive option to me for real-world applications where I would like to share data between threads. Along with Coro's various asynchronous I/O function calls this alternative is very attractive to me. If I ever have the good fortune to meet a problem that requires no sharing of data I will be sure to use processes, aka perl "threads".

I still think there is much more to post on the subject. I have barely scratched the surface but I believe I have dug a little deeper yet than kbenson.

PS: For the benefit of others, this blog conversation was started by my comments on the "Curious Programmer" blog. I jokingly said we should benchmark perl threads next. After a little back and forth I said that coroutines (using the Coro module) are a better thread than perl's built-in interpreter threads. Yes it was a sweeping remark but I did not feel the necessity to explain my point with a huge comment (this size) on another's blog and eventually gave up on the conversation. Then it started up again, whee!

My Results

Here is the sample output from my benchmarks. The files without the -shared suffix perform the same CPU bound operation. Files ending in -shared perform the operation while also incrementing a counter as shared data.

COMMAND: /Users/juster/perl/5.12.1/bin/perl -v
OUTPUT:

This is perl 5, version 12, subversion 1 (v5.12.1) built for darwin-thread-multi-2level

Copyright 1987-2010, Larry Wall

Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.

Complete documentation for Perl, including FAQ lists, should be found on
this system using "man perl" or "perldoc perl". If you have access to the
Internet, point your browser at http://www.perl.org/, the Perl Home Page.


----

COMMAND: /Users/juster/python/2.7/bin/python -V
OUTPUT:
Python 2.7

----

COMMAND: process.pl
OUTPUT:
Running with 2 processes
Process 2 :: started at Thu Jul 22 15:35:07 2010
Process 2 :: stopped at Thu Jul 22 15:35:12 2010
Process 1 :: started at Thu Jul 22 15:35:07 2010
Process 1 :: stopped at Thu Jul 22 15:35:13 2010
Master logged 6.213 runtime
Processes reported 11.316 combined runtime

----

COMMAND: process.py
OUTPUT:
Running with 2 processes
Process 2 :: started at Thu Jul 22 15:35:13 2010
Process 2 :: stopped at Thu Jul 22 15:35:18 2010
Process 1 :: started at Thu Jul 22 15:35:13 2010
Process 1 :: stopped at Thu Jul 22 15:35:18 2010
Master logged 4.920 runtime
Threads reported 9.827 combined runtime

----

COMMAND: process-share.pl
OUTPUT:
Running with 2 processes
Process 1 :: started at Thu Jul 22 15:35:18 2010
Process 1 :: stopped at Thu Jul 22 15:37:26 2010
Process 2 :: started at Thu Jul 22 15:35:18 2010
Process 2 :: stopped at Thu Jul 22 15:37:28 2010
Master logged 129.500 runtime
Processes reported 256.975 combined runtime
Shared counter equals 20000000

----

COMMAND: process-share.py
OUTPUT:
Running with 2 processes
Process 1 :: started at Thu Jul 22 15:37:28 2010
Process 1 :: stopped at Thu Jul 22 15:39:30 2010
Process 2 :: started at Thu Jul 22 15:37:28 2010
Process 2 :: stopped at Thu Jul 22 15:39:30 2010
Master logged 121.869 runtime
Threads reported 243.698 combined runtime
Shared counter equals 20000000

----

COMMAND: coroutine.pl
OUTPUT:
Running with 2 coroutines
Coroutine 1 :: started at Thu Jul 22 15:39:30 2010
Coroutine 2 :: started at Thu Jul 22 15:39:30 2010
Coroutine 1 :: stopped at Thu Jul 22 15:39:51 2010
Coroutine 2 :: stopped at Thu Jul 22 15:39:51 2010
Master logged 20.573 runtime
Coroutines reported 41.145 combined runtime

----

COMMAND: coroutine.py
OUTPUT:
Running with 2 coroutines
Coroutine 1 :: started at Thu Jul 22 15:39:51 2010
Coroutine 2 :: started at Thu Jul 22 15:39:51 2010
Coroutine 1 :: stopped at Thu Jul 22 15:40:04 2010
Coroutine 2 :: stopped at Thu Jul 22 15:40:04 2010
Master logged 13.098 runtime
Coroutines reported 25.946 combined runtime

----

COMMAND: coroutine-share.pl
OUTPUT:
Running with 2 coroutines
Coroutine 1 :: started at Thu Jul 22 15:40:04 2010
Coroutine 2 :: started at Thu Jul 22 15:40:04 2010
Coroutine 1 :: stopped at Thu Jul 22 15:40:26 2010
Coroutine 2 :: stopped at Thu Jul 22 15:40:26 2010
Master logged 21.903 runtime
Coroutines reported 43.805 combined runtime
Shared counter equals 20000000

----

COMMAND: coroutine-share.py
OUTPUT:
Running with 2 coroutines
Coroutine 1 :: started at Thu Jul 22 15:40:26 2010
Coroutine 2 :: started at Thu Jul 22 15:40:26 2010
Coroutine 1 :: stopped at Thu Jul 22 15:42:51 2010
Coroutine 2 :: stopped at Thu Jul 22 15:42:52 2010
Master logged 146.335 runtime
Coroutines reported 291.801 combined runtime
Shared counter equals 20000000

----

EOF