Monday, March 31, 2008

ReviewBoard: open source code review tool

Via Marc Hedlund's post on O'Reilly Radar, here's an open source code review tool from VMWare: ReviewBoard. For all of us non-googlers out there, it's probably the next best thing to Guido's Mondrian (question: why has that tool not been released as open source?). Check out the sweet screenshots. The kicker though is that it uses Python and Django. Way to go, VMWare!

Python code complexity metrics and tools

There's a buzz in the air around code complexity, metrics, code coverage, etc. It started with Matt Harrison's PyCon presentation, then Ned Batchelder jumped in with a nice McCabe cyclomatic complexity computation/visualization tool, and now David Stanek posted about his pygenie tool -- which also measures the McCabe cyclomatic complexity of Python code. Now it's time to unify all these ideas in one powerful tool that computes not only complexity but also path or at least branch coverage. This would make a nice Google Summer of Code project. Too bad the deadline for 2008 GSoC applications is in 7 hours...Maybe for next year.

Update: David Goodger left a comment pointing me to Martin Blais's snakefood package, which computes and shows dependencies for your Python code. It's a good complement to the tools I mentioned above.

Friday, March 28, 2008

Recommended testing conference: CAST 2008

If you're a tester and are serious about learning and advancing in your trade, I warmly recommend the CAST 2008 conference which will be held in Toronto, July 14-16. The theme of the conference is "Beyond the Boundaries: Interdisciplinary Approaches to Software Testing" and the keynote speaker is none other than Jerry Weinberg. And it's REALLY hard to get Jerry Weinberg to speak at a conference, so you might as well take advantage of this opportunity. For more details on CAST 2008, download the PDF brochure.

It's a good time to be a Python programmer

We had the SoCal Piggies meeting at the Disney Animation Studios last night. It was a great meeting -- great presentations from Disney engineers on how they use Python at Disney (and they use it A LOT!), great food, great turnout, and great atmosphere. Let me tell you -- the Disney Animation Studios are *lush*. Thanks to Paul Hildebrandt for organizing the meeting.

I'll probably blog separately about the technical content of the presentations, but for now I just wanted to comment on the fact that everybody seems to be hiring Python programmers -- Gorilla Nation and Virgin Charter are just two companies in the L.A. area that are aggressively looking to hire Python talent. Another thing: we used to have difficulties in finding venues for our meetings. We used to meet at either USC or Caltech, and around 10-12 people max. would show up. Now companies are clamoring for organizing the meetings at their offices, and we have 20-30 people in the audience, with many new faces at every meeting. Even more: Ruby on Rails programmers are showing up at our meetings, looking for an opportunity to be more involved with Python!

I take that as a sign that Python has arrived. It's a good time to be a Python programmer (or tester, for that matter.)

Tuesday, March 25, 2008

Easy parsing with pyparsing

If you haven't used Paul McGuire's pyparsing module yet, you've been missing out on a great tool. Whenever you hit a wall trying to parse text with regular expressions or string operations, 'think pyparsing'.

I had the need to parse a load balancer configuration file and save certain values in a database. Most of the stuff I needed was fairly easily obtainable with regular expressions or Python string operations. However, I was stumped when I encountered a line such as:

bind http "Customer Server 1" http "Customer Server 2" http

This line 'binds' a 'virtual server' port to one or more 'real servers' and their ports (I'm using here this particular load balancer's jargon, but the concepts are the same for all load balancers.)

The syntax is 'bind' followed by a word denoting the virtual server port, followed by one or more pairs of real server names and ports. The kicker is that the real server names can be either a single word containing no whitespace, or multiple words enclosed in double quotes.

Splitting the line by spaces or double quotes is not the solution in this case. I started out by rolling my own little algorithm and keeping track of where I am inside the string, then I realized that I'm actually writing my own parser at this point. Time to reach for pyparsing.

I won't go into the details of how to use pyparsing, since there is great documentation available (see Paul's PyCon06 presentation, the examples on the pyparsing site, and also Paul's O'Reilly Shortcut book). Basically you need to define your grammar for the expression you need to parse, then translate it into pyparsing-specific constructs. Because pyparsing's API is so intuitive and powerful, the translation process is straightforward.

Here's how I ended up implementing my pyparsing grammar:

from pyparsing import *

def parse_bind_line(line):
quoted_real_server = dblQuotedString.setParseAction(removeQuotes)
real_server = Word(alphas, printables) | quoted_real_server
port = Word(alphanums)
real_server_port = Group(real_server + port)
bind_expr = Suppress(Literal("bind")) + \
port + \
OneOrMore(real_server_port)
return bind_expr.parseString(line)

That's all there is to it. You need to read it from the bottom up to see how the expression gets decomposed into elements, and elements get decomposed into sub-elements.

I'll explain each line, starting with the last one before the return:

bind_expr = Suppress(Literal("bind")) + \
port + \
OneOrMore(real_server_port)

A bind expression starts with the literal "bind", followed by a port, followed by one or more real server/port pairs. That's pretty much what the line above actually says, isn't it. The Suppress construct tells pyparsing that we're not interested in returning the literal "bind" in the final token list.


real_server_port = Group(real_server + port)

A real server/port pair is simply a real server name followed by a port. The Group construct tells pyparsing that we want to group these 2 tokens in a list inside the final token list.


port = Word(alphanums)

A port is a word composed of alphanumeric characters. In general, word means 'a sequence of characters containing no whitespace'. The 'alphanums' variable is a special pyparsing variable already containing the list of alphanumeric characters.


real_server = Word(alphas, printables) | quoted_real_server

A real server is either a single word, or an expression in quotes. Note that we can declare a pyparsing Word with 2 arguments; the 1st argument specifies the allowed characters for the initial character of the word, whereas the 2nd argument specified the allowed characters for the body of the word. In this case, we're saying that we want a real server name to start with an alphabetical character, but other than that it can contain any printable character.


quoted_real_server = dblQuotedString.setParseAction(removeQuotes)

Here is where you can glimpse the power of pyparsing. With this single statement we're parsing a sequence of words enclosed in double quotes, and we're saying that we're not interested in the quotes. There's also a sglQuotedString class for words enclosed in single quotes. Thanks to Paul for bringing this to my attention. My clumsy attempt at manually declaring a sequence of words enclosed in double quotes ran something like this:


no_quote_word = Word(alphanums+"-.")
quoted_real_server = Suppress(Literal("\"")) + \
OneOrMore(no_quote_word) + \
Suppress(Literal("\""))
quoted_real_server.setParseAction(lambda tokens: " ".join(tokens))

The only useful thing you can take away from this mumbo-jumbo is that you can associate an action with each token. When pyparsing will encounter that token, it will apply the action (function or class) you specified on that token. This is useful for doing validation of your tokens, for example for a date. Very powerful stuff.

Now it's time to test my function on a few strings:

if __name__ == "__main__":
tests = """\
bind http "Customer Server 1" http "Customer Server 2" http
bind http "Customer Server - 11" 81 "Customer Server 12" 82
bind http www.mywebsite.com-server1 http www.mywebsite.com-server2 http
bind ssl www.mywebsite.com-server1 ssl www.mywebsite.com-server2 ssl
bind http TEST-server http
bind http MY-cluster-web11 83 MY-cluster-web-12 83
bind http cust1-server1.site.com http cust1-server2.site.com http
""".splitlines()

for t in tests:
print parse_bind_line(t)


Running the code above produces this output:


$ ./parse_bind.py
['http', ['Customer Server 1', 'http'], ['Customer Server 2', 'http']]
['http', ['Customer Server - 11', '81'], ['Customer Server 12', '82']]
['http', ['www.mywebsite.com-server1', 'http'], ['www.mywebsite.com-server2', 'http']]
['ssl', ['www.mywebsite.com-server1', 'ssl'], ['www.mywebsite.com-server2', 'ssl']]
['http', ['TEST-server', 'http']]
['http', ['MY-cluster-web11', '83'], ['MY-cluster-web-12', '83']]
['http', ['cust1-server1.site.com', 'http'], ['cust1-server2.site.com', 'http']]

From here, I was able to quickly identify for a given virtual server everything I needed: a virtual server port, and all the real server/port pairs associated with it. Inserting all this into a database was just another step. The hard work had already been done by pyparsing.

Once more, kudos to Paul McGuire for creating such an useful and fun tool.

Sunday, March 23, 2008

PyCon08 gets great coverage

Reports on the death of the PyCon conference as a community experience have been greatly exaggerated. I personally have never seen any PyCon edition as well covered in the blogs aggregated in Planet Python as the 2008 PyCon. If you don't believe me, maybe you'll believe Google Blog Search. I think the Python community is alive and well, and ready to rock at PyCon conferences for the foreseeable future. I'm looking forward to PyCon09 in Chicago, and then probably in San Francisco for 2010/11.

Wednesday, March 19, 2008

PyCon presenters, unite!

If you gave a talk at PyCon and haven't uploaded your slides to the official PyCon website yet, but you have posted them online somewhere else, please leave a comment to this post with the location of your slides. I'm helping Doug Napoleone upload the slides, since some authors have experienced issues when trying to upload the slides using their PyCon account. Thanks!

Tuesday, March 18, 2008

Links to resources from PyCon talks

I took some notes at the PyCon talks I've been to, and I'm gathering links to resources referenced in these talks. Hopefully they'll be useful to somebody (I know they will be to me at least.)

"MPI Cluster Programming with Python and Amazon EC2" by Pete Skomoroch

* slides in PDF format
* Message Passing Interface (MPI) modules for Python: mpi4py, pympi
* ElasticWulf project (Beowulf-like setup on Amazon EC2)
* IPython1: parallel computing in Python
* EC2 gotchas

"Like Switching on the Light: Managing an Elastic Compute Cluster with Python" by George Belotsky

* S3FS: mount S3 as a local file system using Fuse (unstable)
* EC2UI: Firefox extension for managing EC2 clusters
* S3 Organizer: Firefox extension for managing S3 storage
* bundling an EC2 AMI and storing it to S3
* the boto library, which allows programmatic manipulation of Amazon Web services such as EC2, S3, SimpleDB etc. (a python-boto package is available for most Linux distributions too; for example 'yum install python-boto)

"PyTriton: building a petabyte storage system" by Jonathan Ellis

* All this was done at Mozy (online remote backup, now owned by EMC, just like Avamar, the company I used to work for)
* They maxed out Foundry load balancers, so they ended up using LVS + ipvsadm
* They used erasure coding for data integrity -- rolled their own algorithm but Jonathan recommended that people use zfec developed by AllMyData
* An alternative to erasure coding would be to use RAID6, which is used by Carbonite

"Use Google Spreadsheets API to create a database in the cloud" by Jeffrey Scudder

* slides online
* APIs and documentation on google code

"Supervisor as a platform" by Chris McDonough and Mike Naberezny

* slides online
* supervisord home page

"Managing complexity (and testing)" by Matt Harrison

* slides online
* PyMetrics module for measuring the McCabe complexity of your code
* coverage module and figleaf module for measuring your code coverage

Resources from lightning talks

* bug.gd -- online repository of solutions to bugs, backtraces, exceptions etc (you can easy_install bug.gd, then call error_help() after you get a traceback to try to get a solution)
* geopy -- geocode package
* pvote.org -- Ka-Ping Yee's electronic voting software in 460 lines of Python (see also Ping's PhD dissertation on the topic of Building Reliable Voting Machine Software)
* bitsyblog -- a minimalist approach to blog software

JP posted nose talk slides and code

If you attended Jason Pellerin's talk on the nose test framework at PyCon08, you'll be glad to know he just posted his slides and the sample app that shows how he writes and runs unit and functional tests under nose. I'm advertising this here because he only sent a message to the nose mailing list.

Monday, March 17, 2008

Slides and links from the Testing Tools tutorial

Here are my slides from the Testing Tools tutorial in PDF format. Not very informative I'm afraid -- I didn't actually show them to the attendees, I just talked about those topics while demo-ing them. If you want to find out more about the state of the Selenium project, watch this YouTube video of the Selenium Meetup at Google.

Here are some random thoughts on Selenium testing which I mentioned during the tutorial:

  • composing Selenium tests, especially for Ajax functionality, is HARD; the Selenium IDE helps a bit, but you still have to figure out how to wait for certain HTML elements to either appear or disappear from the page under test
  • version 1.0 of the Sel. IDE, soon to be released, will record Ajax actions by default, so hopefully this will speed up Selenium test creation
  • if you already have a Selenium Core test in HTML format, an easy way to obtain a Selenium RC test in Python is to open the HTML file in the Selenium IDE, then export the test case as Python; however, to actually make the resulting code readable/reusable, you have to do some pretty major refactoring
  • identifying HTML elements (or locators, as Selenium calls them) by their XPath value is hard, but it's sometimes the only way to get to them in order to assert something about them; I found tools such as XPath Checker, XPather and Firebug invaluable (they all happen to be Mozilla add-ons, but you can use Firefox to compose your tests, then run them in any browser supported by Selenium; however, YMMV especially when it comes to evaluating XPath expressions in IE)
  • because XPath locators are brittle in the face of constant HTML changes, please use HTML ID tags to identify your elements; I know at least one company (hi, Terry) where testers do not even start writing Selenium tests until developers have identified all elements of interest with an HTML ID

For people interested in FitNesse and on various acceptance testing topics, please see my blog posts in the Acceptance Testing and Web Application Testing sections of this page. If you are interested in how Titus and I tested the MailOnnaStick application, we have a whole wiki dedicated to this topic. Another resource of interest might be the Python Testing Tools Taxonomy wiki, with links to a myriad of Python testing tools.

I'll post soon on some topics that were discussed during the tutorial, especially on how to test an application against external interfaces or resources that are not under your control (think 'mocking').

Back from PyCon

PyCon08 is over. It's been an enjoyable experience, but a crowded one, with more than 1,000 people in attendance. The testing tutorial that Titus and I gave on Thursday went well. We tried to have it a bit more interactive than in the last 2 years, so we asked for people to send us their Web apps so we can test them 'in real time'. As it turned out, we only got back a handful of replies and almost no apps, but Christian Long sent me an app that I used to show some nifty Ajax testing with Selenium. We took a lot of questions from the audience on real problems they were facing, and I think we came up with some satisfactory answers/solutions. We need to think of what format we'll choose for next year (if any). Steve Holden and Doug Napoleone were happy with what they got out of the tutorial, Mike Pirnat was not because he had seen the same material last year. I think we did show some new techniques with the same tools that we've been showing for the last 3 years. But the overall content of the tutorial hasn't changed much. If you have any ideas of what you'd like to see next year, drop us a note.

If you read Planet Python or comp.lang.python, you know there's been a whirlwind of discussions around Bruce Eckel's "Pycon disappointment" post. My take on the commercialization of PyCon is that it hasn't been as bad as Bruce makes it sound. The vendor area was very isolated, in a corner room, and you could have easily missed it if you didn't see the people pouring out of there with all kinds of swag. And everybody was hiring! This is a good thing, people! Getting paid to work with your favorite programming language is a privilege that not many people are enjoying.

The other controversial aspect this year was the vendor involvement in the lightning talks. That was indeed highly annoying, and hopefully it won't happen again. I think the solution the organizers found last year -- separating the vendor-sponsored lightning talks from the regular ones -- worked pretty well. I didn't go to all the lightning talk sessions this year, but the one on Sunday was very enjoyable (once the sponsored ones were over). I liked the one on bitsyblog (a minimalist approach to building blog software), and the one by Martin v. Loewis on using Roundup for various non-bugtrack-related projects such as homework assignments. I also found out that Larry Hastings from Facebook is leading an effort to switch Facebook from PHP to a more enlightened language (you know which one), and also that slide.com, which offers what is apparently one of the most (if not THE most) popular Facebook apps, is using Python throughout its development process.

The technical talks were a mixed bag, just like at every other PyCon I participated in. No big surprise here, you can't make everybody happy no matter how hard you try (and the PyCon commitee tried hard, believe me). The critics have a point though, in that we need more advanced topics. Maybe we need a beginner track, an intermediate track and an advanced track. Also, 4 tracks is too much, I think 3 is a better number. My top 3 favorite talks were, in chronological order: "Supervisor as a platform" by Chris McDonough and Mike Naberezny, "Managing complexity (and testing)" by Matt Harrison, and "Introducing agile testing techniques to the OLPC project" by Titus Brown (I hope Titus will make good on his promise of putting together a screencast of the demo he showed, since it's mighty cool).

But the best part of PyCon is always meeting people. It's great to put a face to a name, especially when you're familiar with that person's work or blog. It's great to meet with people you met the previous years too. If nothing else, the socializing alone makes it worth attending PyCon. Think about it: where else would you meet Zed Shaw and have a chance to experience his colorful language and personality, and also hear him rant a bit about the ways in which Python sucks (although I hasten to add that he likes it a lot and he's starting to hack seriously in it). Can't wait to use one of Zed's first contributions to the Python world, a nifty utility called easy_f__ing_uninstall (feel free to fill in the blanks :-)

Whatever the critics say, I know I'll be back in Chicago next year for sure. I just want better network connectivity (why is it so hard to ensure decent wireless connectivity at PyCon year after year? it's a mystery) and better food.

Tuesday, March 04, 2008

Bright days for Python

Yes, the title is related to the news that were all over Planet Python yesterday, that Sun hired two prominent pythonistas (Ted Leung and Frank Wierzbicki) to work on Jython. Great news indeed. Jython seemed to lose steam and momentum as compared to other dynamic JVM languages (JRuby mainly), so it's very good to see that Sun is putting its weight behind it. Congrats to Ted and Frank.

Also, in other news, Greg Wilson just announced on his blog that he signed a contract with the Pragmatic Programmers to co-author a textbook for CS 101-type classes using Python as the programming language. Can't wait to see it in print!

Modifying EC2 security groups via AWS Lambda functions

One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...