Wednesday, February 02, 2005

Web app testing with Jython and HttpUnit

There's been a lot of talk recently about "dynamic Java", by which people generally mean driving the JVM by means of a scripting language (see Tim Bray's post and Sean McGrath's post on this topic). One of the languages leading the pack in this area is Jython (the other one is Groovy). In fact, a Java Republic poll asking "What is your scripting language for Java for 2004?" has Jython as the winner with 59% of the votes.

Update: As a coincidence, while writing this post, I came across this blog entry: Gosling on JVM scripting

Jython is also steadily making inroads into the world of test frameworks. It is perhaps no coincidence that in a talk given at Stanford, Guido van Rossum lists "Testing (popular area for Jython)" on the slide that talks about Python Sample Use Areas. Because Jython combines the agility of Python with easy access to the Java libraries, it is the scripting language of choice for test tools such as The Grinder v3, TestMaker, Marathon and STAF/STAX.

I want to show here how to use Jython for interactively driving a Java test tool (HttpUnit) in order to verify the functionality of a Web application.

HttpUnit is a browser simulator written in Java by Russell Gold. It is used in the Java world for functional, black-box type testing of Web applications. Although its name contains "Unit", it is not a unit test tool, but it is often used in conjunction with the jUnit framework. The canonical way of using HttpUnit is to write jUnit tests that call various HttpUnit components in order to mimic the actions of a browser. These individual tests can then be aggregated into test suites that will be run by the jUnit framework. Building all this scaffolding takes some time, and compiling the Java code after each change adds other delays.

In what follows, I want to contrast the Java-specific HttpUnit usage with the instantaneous feedback provided by working in the Jython shell and with the near-zero overhead that comes with writing Python doctest tests. The functionality I will test is a search for Python books on amazon.com.

Step 1: Install Jython

- The machine I ran my tests on is a Linux server running Red Hat 9 which already had the Java 1.4.2_04 SDK installed in /usr/java/j2sdk1.4.2_04
- I downloaded Jython 2.1 from its download site and I put the file jython_21.class in /usr/local
- I cd-ed into /usr/local and ran the command-line installer, specifying Jython-2.1 as the target directory:
[root@concord root]# cd /usr/local

[root@concord local]# which java
/usr/java/j2sdk1.4.2_04/bin/java
[root@concord local]# java jython_21 -o Jython-2.1 demo lib source
try path /usr/local/
Done
[root@concord local]# ls Jython-2.1/
ACKNOWLEDGMENTS Doc jython.jar org Uninstall.class
cachedir installer Lib README.txt
com jython LICENSE.txt registry
Demo jythonc NEWS Tools
- I added /usr/local/Jython-2.1 to the PATH environment variable in .bash_profile and I sourced that file:
[root@concord root]# . ~/.bash_profile

[root@concord root]# which jython
/usr/local/Jython-2.1/jython
- I verified that I can run the interactive Jython shell (the first time you run it, it will process the its own jython.jar file, plus all jar files that it finds in $JAVA_HOME/jre/lib):
[root@concord local]# jython

*sys-package-mgr*: processing new jar, '/usr/local/Jython-2.1/jython.jar'
*sys-package-mgr*: processing new jar, '/usr/java/j2sdk1.4.2_04/jre/lib/rt.jar'
*sys-package-mgr*: processing new jar, '/usr/java/j2sdk1.4.2_04/jre/lib/sunrsasign.jar'
*sys-package-mgr*: processing new jar, '/usr/java/j2sdk1.4.2_04/jre/lib/jsse.jar'
*sys-package-mgr*: processing new jar, '/usr/java/j2sdk1.4.2_04/jre/lib/jce.jar'
*sys-package-mgr*: processing new jar, '/usr/java/j2sdk1.4.2_04/jre/lib/charsets.jar'
*sys-package-mgr*: processing new jar, '/usr/java/j2sdk1.4.2_04/jre/lib/ext/dnsns.jar'
*sys-package-mgr*: processing new jar, '/usr/java/j2sdk1.4.2_04/jre/lib/ext/ldapsec.jar'
*sys-package-mgr*: processing new jar, '/usr/java/j2sdk1.4.2_04/jre/lib/ext/localedata.jar'
*sys-package-mgr*: processing new jar, '/usr/java/j2sdk1.4.2_04/jre/lib/ext/sunjce_provider.jar'
Jython 2.1 on java1.4.2_04 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>>
Step 2: Install HttpUnit

- I downloaded HttpUnit 1.6 from its download site and I unzipped the file httpunit-1.6.zip under /root
- The main HttpUnit functionality is contained in the httpunit.jar file in /root/httpunit-1.6/lib and other optional jar files are in /root/httpunit-1.6/jars, so I added all the jar files in these two directories to the CLASSPATH environment variable in .bash_profile. Here is the relevant portion from .bash_profile:
# Set up CLASSPATH for HttpUnit

CLASSPATH=$CLASSPATH:/root/httpunit-1.6/lib/httpunit.jar
CLASSPATH=$CLASSPATH:/root/httpunit-1.6/jars/js.jar
CLASSPATH=$CLASSPATH:/root/httpunit-1.6/jars/servlet.jar
CLASSPATH=$CLASSPATH:/root/httpunit-1.6/jars/Tidy.jar
CLASSPATH=$CLASSPATH:/root/httpunit-1.6/jars/xercesImpl.jar
CLASSPATH=$CLASSPATH:/root/httpunit-1.6/jars/xmlParserAPIs.jar
export CLASSPATH
- I sourced .bash_profile, then I went to the jython shell and verified that the new jar files are seen by Jython:
[root@concord root]# . ~/.bash_profile

[root@concord root]# jython
*sys-package-mgr*: processing new jar, '/root/httpunit-1.6/lib/httpunit.jar'
*sys-package-mgr*: processing new jar, '/root/httpunit-1.6/jars/js.jar'
*sys-package-mgr*: processing new jar, '/root/httpunit-1.6/jars/servlet.jar'
*sys-package-mgr*: processing new jar, '/root/httpunit-1.6/jars/Tidy.jar'
*sys-package-mgr*: processing new jar, '/root/httpunit-1.6/jars/xercesImpl.jar'
*sys-package-mgr*: processing new jar, '/root/httpunit-1.6/jars/xmlParserAPIs.jar'
Jython 2.1 on java1.4.2_04 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>>
- I verified that I can import the httpunit Java package from with a Jython shell session:
>>>  from com.meterware.httpunit import *

>>>
- Nothing was printed to the console, which means that the import succeeded. If CLASSPATH had not been set right and Jython had not been able to process the httpunit.jar file, I would have seen an error similar to this:
Traceback (innermost last):

File "", line 1, in ?
ImportError: No module named meterware
Step 3: Use HttpUnit inside a Jython shell session to test a Web application

This is not a full-fledged HttpUnit tutorial. For people who want to learn more about HttpUnit, I recommend the HttpUnit cookbook and this article by Giora-Katz Lichtenstein on O'Reilly's ONjava.com site.

I will however show you some basic HttpUnit usage patterns. The first thing you do in HttpUnit is open a WebConversation, then send an HTTP request to your Web application and get back the response. Let's do this for www.amazon.com inside a Jython shell:
[root@concord root]# jython

Jython 2.1 on java1.4.2_04 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> from com.meterware.httpunit import *
>>> web_conversation = WebConversation()
>>> request = GetMethodWebRequest("http://www.amazon.com")
>>> response = web_conversation.getResponse(request)
>>> response != None
1
We're already seeing some advantages of using Jython over writing Java code: no type declarations necessary! We're also testing that we get a valid response back by expecting to see 1 when we type response != None.

If we were to print the response variable, we would see the HTTP headers:
>>> print response

HttpWebResponse [url=http://www.amazon.com/exec/obidos/subst/home/home.html/002-1556899-2409632; headers=
CONTENT-TYPE: text/html
CNEONCTION: close
TRANSFER-ENCODING: chunked
SERVER: Stronghold/2.4.2 Apache/1.3.6 C2NetEU/2412 (Unix) amarewrite/0.1 mod_fastcgi/2.2.12
DATE: Thu, 03 Feb 2005 21:34:29 GMT
SET-COOKIE: obidos_path_continue-shopping=continue-shopping-url=/subst/home/home.html/002-1556899-2409632&continue-shopping-post-data=&continue-shopping-description=generic.gateway.default; path=/; domain=.amazon.com
SET-COOKIE: ubid-main=077-3170816-5986942; path=/; domain=.amazon.com; expires=Tuesday, 01-Jan-2036 08:00:01 GMT ]
We could also look at the raw HTML output via response.getText() (I will omit the output, since it takes a lot of space).

At this point, I want to say that testing a Web application via its GUI is a very error-prone endeavor. Any time the name or the position of an HTML element under test changes, the test will break. Generally speaking, testing at the GUI level is notoriously brittle and should only be done when there is a strong chance that the GUI layout and element names will not change. It's almost always better to test the business logic underneath the GUI (assuming the application was designed to clearly separate the GUI logic from the business logic) via a tool such as FitNesse, which can simulate GUI actions without actually going through the GUI.

However, there certainly are cases when one simply cannot skip testing the GUI, and HttpUnit is a decent tool for achieving this goal in the case of a Web application. Let's continue our example and test the search functionality of the main amazon.com Web page. If we were part of a QA team at amazon.com, we would probably expect the HTML design team to hand us a document detailing the layout of the main HTML pages comprising the site and the names of their main elements (forms, frames, etc.) As it is, we need to hunt for this information ourselves by playing with the live site itself and carefully poring through the HTML source of the pages we want to test.

I said before that in HttpUnit we can also get the raw HTML output via response.getText(). The response variable is an instance of the HttpUnit WebResponse class, which offers many useful methods for dealing with HTML elements. We can obtain collections of forms, tables, links, images and other HTML elements, then iterate over them until we find the element we need to test. We can alternatively get a specific element directly from the response by calling methods such as getLinkWithID() or getTableWithID().

If we search for the word "form" inside the HTML page source on the main amazon.com Web page, we see that the search form is called "searchform". We can retrieve this form from the response variable via the getFormWithName() method:
>>> search_form = response.getFormWithName("searchform")

>>> search_form != None
1
We can also see from the HTML page source that the form has two input fields: a drop-down list of values called "url" and an entry field called "field-keywords". We will use the form's setParameter() method to fill both fields with our information: "Books" (which actually corresponds to the value "index=stripbooks:relevance-above") for the drop-down list and "Python" for the entry field:
>>> search_form.setParameter("url", "index=stripbooks:relevance-above")

>>> search_form.setParameter("field-keywords", "Python")
Now we can simulate submitting our information via the form's submit() method:
>>> search_response = search_form.submit()

>>> search_response != None
1
At this point, search_response represents the HTML page containing the 3 most popular search results for "Python", followed by the first 10 of the total number of relevant results (370 results when I tried it).

The HTML source for this page looks confusing to say the least. It's composed of a myriad of tables, which can be eyeballed by this code:
>>> tables = search_response.getTables()

>>> print tables
Let's pretend we're only interested in the 3 most popular search results. If we look carefully through the output returned by print tables, we see that the first cell in the table containing the 3 most popular results is "1.". We can use this piece of information in retrieving the whole table via the getTableStartingWith() method:
>>> most_popular_table = search_response.getTableStartingWith("1.")

>>> most_popular_table != None
1
We can quickly inspect the contents of the table by simply printing it:
>>> print most_popular_table

WebTable:
[0]: [0]=1. [1]=Learning Python, Second Edition -- by Mark Lutz, David Ascher; Paperback
Buy new: $23.07 -- Used & new from: $15.42
[1]: [0]=2. [1]=Python Cookbook -- by Alex Martelli, David Ascher; Paperback
Buy new: $26.37 -- Used & new from: $22.30
[2]: [0]=3. [1]=Python Programming for the Absolute Beginner (Absolute Beginner) -- by Michael Dawson; Paperback
Buy new: $19.79 -- Used & new from: $19.78
We see that the table has 3 rows and 2 columns. We can make this into a test by using the getRowCount() and getColumnCount() methods of the table object:
>>> rows = most_popular_table.getRowCount()

>>> rows == 3
1
>>> columns = most_popular_table.getColumnCount()
>>> columns == 2
1
From the output of print most_popular_table we also see that the second column in each row contains information about the book: title, authors, new price and used price. If we look at the live page on amazon.com, we notice that each title is actually a link. Let's say we want to test the link for each of the 3 top titles. We expect that by clicking on the link we will get back a page with details corresponding to the selected title.

For starters, let's test the first title, the one at row 0. We can retrieve its link by calling the getLinkWith() method of the search_response object, and passing to it the title of the book (which we need to retrieve from the contents of the cell in column 2 via a regular expression):
>>> book_info = most_popular_table.getCellAsText(0, 1)

>>> import re
>>> title = ""
>>> s = re.search("(.*) --", book_info)
>>> if s:
... title = s.group(1)
...
>>> title.find("Python") > -1
1
>>> link = search_response.getLinkWith(title)
>>> link != None
1
Note that we also tested that the title contains "Python". Although this test may fail, it's nevertheless a pretty sure bet that each of the 3 top selling books on Python will have the word "Python" somewhere in their title.

We can now simulate clicking on the link via the link object's click() method. We verify that we get back a non-empty page and also that the HTML title of the book detail page contains the title of the book:
>>> book_details = link.click()

>>> book_details != None
1
>>> page_title = book_details.getTitle()
>>> page_title.find(title) > -1
1
We can test the links for all of the top 3 titles by looping through the rows of most_popular_table:
>>> import re

>>> for i in range(rows):
... book_info = most_popular_table.getCellAsText(i, 1)
... title = ""
... s = re.search("(.*) --", book_info)
... if s:
... title = s.group(1)
... title.find("Python") > -1
... link = search_response.getLinkWith(title)
... link != None
... book_details = link.click()
... book_details != None
... page_title = book_details.getTitle()
... page_title.find(title) > -1
...
1
1
1
1
1
1
1
1
1
1
1
1
>>>

We have 4 test statements which expect 1 as a result in the body of the loop. Since there are 3 rows to inspect, we should expect 12 1's to be printed.

I'll stop here with my example. In a real-life situation, you would want to test much more functionality, but this example should be sufficient to get you going with both HttpUnit and Jython.

Step 4: Use the doctest module to write functional tests

Using the Python doctest module, we can save the Jython interactive session conducted so far into a docstring inside a function that we can call for example test_amazon_search. We can put this function (with an empty body) inside a module called test_amazon.py:
def test_amazon_search():

"""
>>> from com.meterware.httpunit import *
>>> web_conversation = WebConversation()
>>> request = GetMethodWebRequest("http://www.amazon.com")
>>> response = web_conversation.getResponse(request)
>>> response != None
1
>>> search_form = response.getFormWithName("searchform")
>>> search_form != None
1
>>> search_form.setParameter("url", "index=stripbooks:relevance-above")
>>> search_form.setParameter("field-keywords", "Python")
>>> search_response = search_form.submit()
>>> search_response != None
1
>>> tables = search_response.getTables()
>>> tables != None
1
>>> most_popular_table = search_response.getTableStartingWith("1.")
>>> most_popular_table != None
1
>>> rows = most_popular_table.getRowCount()
>>> rows == 3
1
>>> columns = most_popular_table.getColumnCount()
>>> columns == 2
1
>>> for i in range(rows):
... book_info = most_popular_table.getCellAsText(i, 1)
... import re
... title = ""
... s = re.search("(.*) --", book_info)
... if s:
... title = s.group(1)
... title.find("Python") > -1
... link = search_response.getLinkWith(title)
... link != None
... book_details = link.click()
... book_details != None
... page_title = book_details.getTitle()
... page_title.find(title) > -1
...
1
1
1
1
1
1
1
1
1
1
1
1
"""

if __name__ == "__main__":
import doctest, test_amazon
doctest.testmod(test_amazon)
Note that we need to keep in the docstring only those portions of the Jython interactive session which do not change from one test run to another. We can't put there things like print statements that reveal book or title specifics, since these specifics are almost guaranteed to change in the future. We want our test to serve as a functional regression test for the bare-bones search functionality of amazon.com.

An interesting note is that the doctest module is used here to conduct a black-box type of test, whereas traditionally it is used for unit testing.

To fully take advantage of the interactive Jython session in order to later include it in a doctest string, I used the "script" trick. On a Unix system, if you type script at a shell prompt, a file called typescript is generated which will contain everything you type afterwards. When you are done with your "script" session, type exit to go back to the normal shell operation. You can then copy and paste the lines saved in the file typescript. This is especially useful for large outputs which can sometimes make other lines scroll past the current window of the shell.

Running the test_amazon module through Jyhon produces this output:
[root@concord jython]# jython test_amazon.py -v

Running test_amazon.__doc__
0 of 0 examples failed in test_amazon.__doc__
Running test_amazon.test_amazon_search.__doc__
Trying: from com.meterware.httpunit import *
Expecting: nothing
ok
Trying: web_conversation = WebConversation()
Expecting: nothing
ok
Trying: request = GetMethodWebRequest("http://www.amazon.com")
Expecting: nothing
ok
Trying: response = web_conversation.getResponse(request)
Expecting: nothing
ok
Trying: response != None
Expecting: 1
ok
Trying: search_form = response.getFormWithName("searchform")
Expecting: nothing
ok
Trying: search_form != None
Expecting: 1
ok
Trying: search_form.setParameter("url", "index=stripbooks:relevance-above")
Expecting: nothing
ok
Trying: search_form.setParameter("field-keywords", "Python")
Expecting: nothing
ok
Trying: search_response = search_form.submit()
Expecting: nothing
ok
Trying: search_response != None
Expecting: 1
ok
Trying: tables = search_response.getTables()
Expecting: nothing
ok
Trying: tables != None
Expecting: 1
ok
Trying: most_popular_table = search_response.getTableStartingWith("1.")
Expecting: nothing
ok
Trying: most_popular_table != None
Expecting: 1
ok
Trying: rows = most_popular_table.getRowCount()
Expecting: nothing
ok
Trying: rows == 3
Expecting: 1
ok
Trying: columns = most_popular_table.getColumnCount()
Expecting: nothing
ok
Trying: columns == 2
Expecting: 1
ok
Trying:
for i in range(rows):
book_info = most_popular_table.getCellAsText(i, 1)
import re
title = ""
s = re.search("(.*) --", book_info)
if s:
title = s.group(1)
title.find("Python") > -1
link = search_response.getLinkWith(title)
link != None
book_details = link.click()
book_details != None
page_title = book_details.getTitle()
page_title.find(title) > -1
Expecting:
1
1
1
1
1
1
1
1
1
1
1
1
ok
0 of 20 examples failed in test_amazon.test_amazon_search.__doc__
1 items had no tests:
test_amazon
1 items passed all tests:
20 tests in test_amazon.test_amazon_search
20 tests in 2 items.
20 passed and 0 failed.
Test passed.
Some parting thoughts:

1. Porting Java code to Jython is a remarkably smooth and painless process. I ported the OnJava.com example to Jython and in the process got a 40% reduction in line count (you can find the original Java code here and the Jython code here). While doing this, I gleefully got rid of ugly Java idioms such as:
for(int i=0; i < resultLinks.length; i++)

{
String url = resultLinks[i].getURLString();
}
and replaced them with the more elegant:
for link in result_links:

url = link.getURLString()
2. My one-to-one porting from Java to Jython used unittest, which naturally corresponds to the original jUnit code. However, when I started using Jython interactively in a shell session, I realized that doctest is the proper test framework to use in this case.

3. I wish Jython could keep up with CPython. For example, the doctest version shipped with Jython 2.1 does not have the testfile functionality which allows you to save the docstrings in separate text files and add free-flowing text.

4. HttpUnit offers limited Javascript support. This can be a problem in practice, since a large number of sites are heavy on Javascript. While trying to find a good example for this post, I tried a number of sites and had HttpUnit bomb when trying to either retrieve the main page or post via a search form (such sites include monster.com, hotjobs.com, freshmeat.net, sourceforge.net).

In conclusion, I think there is a real advantage in using Jython over Java in order to quickly prototype tests that use third-party Java libraries. The combination of Jython and doctest proves to be extremely "agile", since it simplifies the test code, it enhances its clarity, and it provides instantaneous feedback -- all eminently agile qualities.

2 comments:

Anonymous said...

I absolutely agree with your observation that HttpUnit is not typically used for unit tests.

The firefox browser has a tool called the DOM Inspector (do the custom installation) that's great for locating elements in complex web pages. It gives you a tree view of the document, and you can click on something on the page and the DOM Inspector will highlight the corresponding element in the tree. It beats searching through HTML source.

Another alternative for this type of testing that solves the Javascript problem is to use a scripting language to drive IE through it's COM interface. There's a tool based on Ruby called WATiR that does this.

Kevin Christen

Laurent Ploix said...

I also agree that a java related scripting language is the best way to write java tests... whatever it is.

I have the feeling many 'real world' architecture are complex, with different languages in use. And one of the best glue is... python in general. It has librairies under unix, w$, and glue for win32, soap, ...

So using python to test your devs (and jython in particular when it comes to java) saves you a lot of time : you just have to learn one scripting language for tests purposes : python.

Modifying EC2 security groups via AWS Lambda functions

One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...