Archive for the ‘Software Engineering’ Category

Do not construct URLs with concatenation

Wednesday, April 18th, 2007

I’m working on an installation of the Joomla! CMS where none of the links are working correctly. Joomla! is very sloppy with URLs. The uploads directory appears to be called images/stories but a quick grep shows that that exact string is referenced 146 times in the Joomla! installation. That’s in the source code, not the database. Most of those times it is being concatenated into strings to make URLs.

I’ve just spent three hours working out that I have no idea what Joomla or the XHTMLSuite editor the client has chosen to use is doing and that I don’t give a damn because whatever they are doing, they are wrong.

The correct way to construct a URL from a filename is not concatenation. Do not do this. It does not work properly. So to avoid any confusion let me state categorically how URLs are supposed to work.

Relative URLs are the only situation where a web browser tries to interpret the query string of an HTTP request. For this purpose, the URLs http://hostname/directory and http://hostname/directory/ are not the same. The latter form is correct. The former works because Apache works out that this is a directory and issues an HTTP redirect to “canonicalise” it. Never hard code a URL for a “directory” which does not contain a trailing slash. If it isn’t hard-coded, make sure that the application appends a trailing slash if none exists.

There are two operations which you then need to define to be able to construct URLs:

  • Given an absolute base URL A, and an absolute or relative URL B, compute a new URL B` which is an absolute representation of B in the context of A.
  • Given an absolute or relative URL, append a query-string parameter.

The first operation is not concatenation. Learn this.

In notation, let A ~ B = B`

So say you want a URL for a specific uploaded image. Start with a base URL for your site.

http://mysite/

We then have a relative url of our image directory from the base url.

images/stories/

Then http://mysite/ ~ images/stories/ = http://mysite/images/stories/

We have a filename of our image, “Uploaded Image.jpg”. First, we need to make that a relative URL. This requires URL encoding:

Uploaded%20Image.jpg

Then http://mysite/images/stories/ ~ Uploaded%20Image.jpg = http://mysite/images/stories/Uploaded%20Image.jpg

At this point we have a working URL. I know, it looks like all we’ve done is concatenation, and that’s why people appear to make this mistake time and time and time again. But it isn’t concatenation. What if our base URL was http://mysite/CMS/ and our images URL was /uploads/ ? Or what if our images URL is http://uploads.mysite/?

More than that, using this operation doesn’t let people go wrong. It discourages them from just wedging a / in there in the hope that it will make their URLs work, and prevents ambiguity about whether a piece of code works in all situations or just the way they’ve got it configured.

Unit Testing

Wednesday, April 11th, 2007

I am missing a way to write unit tests for web applications. I found a few options online, but they aren’t really along the lines of what I’m looking for. I want to be able to describe unit and regression tests with respect to expected or unexpected DOM fragments, make requests, fill and post forms, check the results and run the whole test suite automatically as a cron job or before committing. I want the whole test suite to be described in XML so that writing web-level tests doesn’t require programming, and so that it can go into Subversion along with the project code.

I think I will have to write this myself. In fact it’s something I’ve been really wanting for years. But I’m way too busy at the moment to do it.

API Design

Wednesday, November 8th, 2006

My discussion about easy scripting of web apps is part of the subject of API design, which I do find very interesting. API choice is subjective, but is it that subjective? There should be examples of APIs everyone can point to and say “That is how/how not to do it”.

Obviously, I’m a fan of object-oriented APIs, but the ease of use of OO APIs differs greatly.

Python’s standard API is too flat. The modules are self-contained rather than having dependencies on one another. This means most functions will tend to return a primitive rather than a more suitable datatype. There is no consistency in naming, but that’s not too bad because of the interactive interpreter and help.

Java’s API is too nested. It is totally interdependent, but its power usually comes from composition of the objects rather than subclassing, so some of the packages get extremely complex. It is entirely based on design patterns and has a consistent naming policy which they only break occassionally (*cough* System.currentTimeMillis() *cough*). I think I prefer this to any other API I’ve used, despite areas of high complexity and still not having found a truly succinct way of using AWT and Swing (perhaps I will try a XUL implementation next time).

The worst API I’ve ever looked at is libxml2, minor faffing with which is required to get XSLT up and running in Python. It’s so bad it gives me a headache just looking at the API documentation! Attempting to do anything with XML without OOP is just a horrific thing to envisage and for this alone, libxml is contemptible.