Web Standards in the Next Generation

December 22nd, 2007

With the news this week that Microsoft have a build of Internet Explorer that can pass Acid2, I wonder if I will be forced to eat my words when I suggested recently that Internet Explorer may be falling further behind with web standards, not closing the gap.

Well, we have an interesting opportunity to measure an aspect of that gap. A quick glance at Bugzilla shows that Gecko was able to generate a correct screenshot by 2006-04-17. Internet Explorer claimed correct rendering on 2007-12-12. The gap is 604 days for Gecko, but obviously, greater for other browsers who have been compliant for much longer.

If Internet Explorer 8 progresses anything like Gecko, there will be a large number of bugs still to fix. If Internet Explorer 8 progresses anything like Internet Explorer historically progresses, most of those bugs won't get fixed. In other words, I'll believe it when I see it. That might not be for 20 months and might not be available on Windows XP. In fact, if the same post on the IE blog they are keen to excuse themselves from commitment to specific web standards, offering only a general tone in favour of them but excusing themselves with respect to backward compatibility. Taken as a preamble to a compliant-looking Acid2 rendering, I take this to mean, "we may not deliver this in IE8″. I think everyone hopes they will, but by comparison, some of the Acid2 patches could have been in Firefox 2 but weren't because Firefox 2 was built with a frozen earlier build of Gecko.

Meanwhile, Firefox 3 is drawing closer. My impression is that the gap between 2 and 3 is not huge, which (hear me out!) is because Firefox 2 was excellent and Firefox 3 struggles to improve upon it. The difference for users is relatively minor. Although the new approach to bookmarking is hugely refreshing I think many users, including my parents, just won't get it.

The difference for developers is significantly less marked - the difference between having functional support for a technology that isn't portable to IE and having good support for a technology that isn't portable to IE is not something that will revolutionise the web. In fact, looking at the Firefox 3 for Developers page, the changes are disappointing and even worrying. In some ways it's a return to the browser wars of the late 1990s when competition between browser vendors' extensions demolished the concept of web standards.

  • Support for aspects of HTML5 - there isn't even a first working draft of HTML5. Although it was the WHATWG spec before, complying with a specification this early will mean that the implementation may not conform to the final specification, by which time, developers will be relying on the non-standard behaviour.
  • APNG - APNG is a Mozilla-sponsored bastardisation of PNG to add animation. It doesn't subscribe to the contract of PNG (which expressly forbids animation) and it isn't negotiable properly because it hijacks PNG's MIME type, extension and magic. This spells very bad news for the PNG format. In future it will be impossible to tell if a PNG is animated or not, and of course all legacy software will believe not. Despite the best efforts of a number of people, myself included, but most particularly Glenn Randers-Pehrson, Mozilla refused to adopt amendments which would resolve the conflicting standards and the PNG group failed to ratify APNG as an official extension. Although APNG was an ad-hoc solution to offer animated UI elements in Mozilla, it is being released and promoted as the new web standard for animation and MNG support, although a superior and established format, has been canned.
  • Microformats - Firefox 3 builds-in support for Microformats, which could just as easily be a standalone Javascript library. There's no reason why this should be built-in, except to create a de-facto standard in an API which Mozilla controls. Moreover it promotes microformats as a de-facto standard, which I'm not comfortable with, because I think Microformats are an ugly hack in lieu of a proper solution.

Polymorphic Basket Pattern

December 5th, 2007

I have a design pattern I use when designing an e-Commerce system. I call it the polymorphic basket and as the name suggests, it is a design pattern covering the basket. However the basket is just a special case of an order (one that is stored, typically, in a session rather than in the database), and this pattern also covers orders stored in the database.

The problem the pattern seeks to address is maintaining and pricing a list of items. The naïve solution is to record a reference to the SKU, and a number representing the quantity. This solution does not generalise well. In many shops, there is a mixture of products conforming to different conceptual models. While some products can be fully represented by an SKU code, some need bespoke customisation. It is generally an easy task to create mixed catalogues and customisation pages for these products. Data storage for these products is perhaps simplest using a Concrete Table Inheritance pattern. Even if a given merchant is only selling within one model, they may one day want to supplement their product line with perhaps just a few products sold under a different model.

The pattern is to maintain a OrderItemList of polymorphic objects conforming to a OrderItem interface. On adding an item from the catalogue, the details are copied into an OrderItem of an appropriate type. The OrderItem must encapsulate a copy of catalogue data, not references (in case that data changes or is deleted). There are no situations I have come across where we need to query the database based on the contents of the OrderItemList, so persisting the OrderItems is an ideal case for using serialisation (the Serialized LOB pattern).

The nuances of the OrderItem interface come down to experience. The OrderItemList must also be able to correctly identify and handle duplication of OrderItems - if a compatible item is added, do they stack, remain as duplicates, or refuse to be added? If items are stackable, can shoppers change the quantity they wish to purchase? Can the maximum quantity purchasable vary? An OrderItem must be queried for a price, but how is this price affected by discounts and voucher codes? How does each item affect postage and packing options and costs?

In practice even this system is not sufficient because orders are not necessarily a flat list. In some cases, OrderItems must contain child OrderItems. These are things like add-on packs and upgrades which are conceptually self-contained, but can only be ordered alongside a parent item. Child items are priced seperately but grouped with the parent item for the purposes of removing the item from the basket or changing the quantity.

I include the following example list of items (derived from experience) which a flexible e-Commerce ordering system should be able to handle within a single basket:

  • DVDs - for each product, one SKU and one price (many similar examples).
  • Clothing - for each product, SKUs corresponding to both colour and size. Some sizes may have different prices (many similar examples).
  • Groceries - for each product, SKUs corresponding to different pack sizes at different prices. (many similar examples).
  • Computers - each SKU may be upgraded with a custom combination of add-ons, at extra cost. Some of these may be available as standalone products, other times not (likewise all configurable but mass-produced goods).
  • Rope - pricing is based on the length of rope to be cut from the drum at a different rate per SKU. Users might choose length instead of quantity (likewise textiles).
  • Kitchens/Worktops - each SKU corresponds to a finish, but pricing is complicated, based on how many boards need to be cut to satisfy a layout, given tolerances for carpentry and mitres, and the labour cost of performing that carpentry (likewise anything bespoke).
  • Antiques - each antique can only be purchased by a single buyer and must then be removed unless/until the sale falls through (likewise anything second-hand).
  • Samples - given away free, but limited to one of each SKU per customer. Because they are free the normal delivery charges may not apply, and the checkout might have to be cut short because payment information is not necessary. It may not even be worth combining these into the standard order process, although it might save the merchant some overhead if the customer simply wants a few samples to be chucked in when their real order is dispatched.

Are Internet Explorer's days numbered?

October 22nd, 2007

Web designers and developers are often heard to pour vitriol in the direction of Internet Explorer. I personally find myself cursing its name perhaps once a week. It's always difficult to believe this is a product Microsoft is still trying to promote.

Contemporary GUI development toolkits require an HTML rendering component: Java has javax.swing.JEditorPane; KDE has KHTML; Gtk has GtkHTML. MFC and .NET have Internet Explorer. In this task it is arguably successful: despite being too heavyweight, buggy, and non-portable, it is at least fast and very simple to embed. The applications which use Internet Explorer tend to use it in restricted circumstances so that the largest class of failings - how it measures up to the wild wild web - are not encountered.

With Firefox 3 right around the corner and a total radio silence on a potential IE8, it's worth considering how much Microsoft has to do to maintain its web browser's competitiveness.

Read the rest of this entry »

Misconfigure your browser

October 20th, 2007

Most mistakes web designers make stem from the assumption that the way they are seeing the site in their web browser is the way everyone else sees it. By using uncommon defaults in your web browser, you can ensure that you have left to the default only those aspects of a page which you had intended. I call this "misconfiguring" because it is intentionally configuring a web browser to display pages wrongly. If the page looks correct with such settings it is probable nothing has been left to chance.

It is common practice to test a web site in various web browsers, but this testing can give a false impression of the compatibility regarding different default settings. This is because almost all browsers have settled on a common out-of-the-box set of defaults. The user is allowed to override these defaults. Fail to anticipate this and your website risks visual problems for some users (perhaps 1% or so, to take a wild guess). Problems include:

  • Character set issues, such as £ or the opposite, a broken character symbol (the question mark diamond in Firefox) rather than £.
  • Specifying a font colour but not a background colour, causing clashes or even invisible text. Perhaps 10% of websites fail to specify a page background colour but assume that it will default to white.
  • Linking images which are intended to composite onto white. This looks dreadful onto most other colours.
  • Copy looking illegible due to tiny serif fonts (sans-serif is more legible on low-resolution screens).
  • Body font clashing with image-based buttons and titles: serif and sans-serif fonts rarely mix.
  • Pale-coloured boxouts. What appears pale is in fact a function of the background colour. A light pink box appears pale on a white background, but very bright on a black background, for which the corresponding effect would be a dark red box.

The browser defaults which you may want to change include

  • Character set
  • Background colour
  • Text colour
  • Font style
  • Font size, although in theory you should avoid specifying this for accessibility reasons.

Misconfiguring a web browser is an art. It is perfectly acceptable to use the defaults a user specifies, as long as all of them are respected. While you should feel free to wholly b0rk a browser which you use purely for testing, it is more beneficial to misconfigure the web browser that you use for primary development. For me, this is the same web browser that I use for everything else, Firefox. Therefore the misconfiguration has to be something that isn't wholly unusable to me. More importantly, the new defaults have to be ones that I would be unlikely to use for a website, otherwise I could still rely on my defaults.

Because I tend to use sans-serif fonts, white backgrounds and black, grey or blue text, and UTF-8 character set, my browser is set to default to serif, coppery-orange text on a mid-grey background. I'm experimenting with ISO-8859-11 (Thai) as my default character set, because this isn't compatible with UTF-8 nor ISO-8859-15 for the most common problem area: £ and € symbols. It is compatible for ASCII-range symbols, so try UTF-16 if you are aiming for perfect incompatibility. I don't specify font-size but I Ctrl-roll my mousewheel on occasion to watch how the site changes at different font sizes.

So my default browser settings look like this. If I ever see these styles on a page, I know I've failed to specify something.

The Accessible Calendar

October 11th, 2007

For most of us, visualising date and time comes very naturally. I'm sure if we surveyed how people visualise date and time there would be some similarity among a plethora of different answers. However, one form stands alone for its ubiquity: a year-planner-style calendar. Twelve grids of numbers, each seven columns wide, four to six rows deep. Click on a number to do something with that date.

Calendars are so simple for me and - statistically - you that it's easy to forget that a server-generated calendar like this is actually not accessible at all. The issue here is linearisation: flattening out the days of the calendar to a script that can be read - aloud, in braille… or by a search engine. A calendar in this form linearises to a script that reads like this:

January. One. Two. Three. Four. Five. Six. Seven. Eight. Nine. Ten. Yawn. Zzz. Snore… February. One. Two…

And so on. It's not useful to read out 365 numbers (366 next year) and expect users to just wait to respond when the day of the year they are looking for comes up.

What is the accessible version? Well, the approach to take is to work out the linearisation we would like first. For example,

The following periods are currently available: the fifteenth of October to the twenty-ninth of November, then the eleventh of December to the seventeenth. Which date are you interested in?

Now, for a web page we might not expect to follow this pattern exactly, but the model is clear: list the calendar information, then query for a date. It looks something like this:

Periods Available

  • 15th October - 29th November
  • 11th - 17th December

This is not bad even in a graphical user agent; describing calendars as schedules is not hard to visualise. Note that it would be perfectly possible to use Javascript to convert this calendar to a visual representation. Whether using Javascript in this way is accessible is open to debate. There is no reason screen readers can't execute Javascript. Some, I believe, already do. But there is a trend for graphical browsers to provide Javascript and for non-graphical browsers not to.

I'd suggest if you are willing to forgo an approach that caters ideally to non-visual users, it's possible to do better. We can cheat the system, almost, by using alt attributes in images and image maps to provide the schedule version as above, but for the images and maps to supply the visual layout. In my latest project, I'm also using line boxes full of images rather than a table. This allows a single <a> tag to span a whole month worth of dates. But what it saves on the semantics, it loses on file size. A cheaper approach might be client-side image maps, especially of the lesser-known form involving <a shape=""> tags rather than <area shape=""> elements, or similar features in SVG.

However, even with the approaches I've explored above, this is one area where I suspect there may not be a very solution that does equally well for visual and non-visual users. 365 dates worth of information is difficult to represent in a concise form. That's why we use calendars.

Paypal with Django

October 10th, 2007

In a previous post I discussed the method I used to integrate Paypal's Encrypted Web Payments in generic SSL terms I hoped would make it easy to implement from scratch in any language. I've had a request from Ross Poulton to share the Python code that makes it work using the M2Crypto wrapper. So, here it is:

from M2Crypto import BIO, SMIME, X509
from django.conf import settings

class PaypalOrder(dict):
        """Acts as a dictionary which can be encrypted to Paypal's EWP service"""
        def __init__(self):
                dict.__init__(self)
                self['cert_id']=settings.MY_CERT_ID

        def setNotifyURL(self, notify_url):
                self['notify_url']=notify_url

        # snip more wrapper functions

        def plaintext(self):
                """The plaintext for the cryptography operation."""
                s=''
                for k in self:
                        s+=u'%s=%s\n'%(k,self[k])
                return s.encode('utf-8')

        __str__=plaintext

        def encrypt(self):
                """Return the contents of this order, encrypted to Paypal's
                certificate and signed using the private key
                configured in the Django settings."""

                # Instantiate an SMIME object.
                s = SMIME.SMIME()

                # Load signer's key and cert. Sign the buffer.
                s.load_key_bio(BIO.openfile(settings.MY_KEYPAIR), BIO.openfile(settings.MY_CERT))

                p7 = s.sign(BIO.MemoryBuffer(self.plaintext()), flags=SMIME.PKCS7_BINARY)

                # Load target cert to encrypt the signed message to.
                x509 = X509.load_cert_bio(BIO.openfile(settings.PAYPAL_CERT))
                sk = X509.X509_Stack()
                sk.push(x509)
                s.set_x509_stack(sk)

                # Set cipher: 3-key triple-DES in CBC mode.
                s.set_cipher(SMIME.Cipher('des_ede3_cbc'))

                # Create a temporary buffer.
                tmp = BIO.MemoryBuffer()

                # Write the signed message into the temporary buffer.
                p7.write_der(tmp)

                # Encrypt the temporary buffer.
                p7 = s.encrypt(tmp, flags=SMIME.PKCS7_BINARY)

                # Output p7 in mail-friendly format.
                out = BIO.MemoryBuffer()
                p7.write(out)

                return out.read()

The settings required are as follows:

MY_KEYPAIR='keys/keypair.pem'    #path to keypair in PEM format
MY_CERT='keys/merchant.crt'    #path to merchant certificate
MY_CERT_ID='ASDF12345'    # code which Paypal assign to the certificate when you upload it
PAYPAL_CERT='keys/paypal.crt'    #path to Paypal's own certificate 

The Church of the Search Engines

September 28th, 2007

Do you expect web developers to hold qualifications in computer science? By the same account, you should expect search engine optimisation (SEO) specialists to hold a degree in statistics or game theory. Or computer science, in fact.

Ever since I set up Mauve Internet, it has been asserted on the website that SEO is a myth. In recent weeks I have brushed up on my understanding of the realm of SEO so as to defend Mauve Internet's practices. What I have encountered could reasonably be described a religion. Scant evidence is mused over, formulated into doctrine, and memorized by rote. The priests of SEO wield power in the eyes of the faithful, they preach their beliefs to others and they have heated religious debates about which beliefs are important.

Building a site which is genuinely more popular than the competition is the crux of search engine ranks and the responsibility for that lies entirely with the site owner. There are also a wealth of accessibility techniques for removing barriers to spidering, and there are some common sense techniques, like canonicalising URLs so as not to divide the weight of the page. But these are within the remit of the developer, who, if they are any good, will have done them as standard. More importantly, these are done once and for all. These do not yield incremental improvements and they do not need to be continually revised.

I don't believe SEO specialists stick to this territory, although hopefully many now pay attention to it. SEO specialists I have corresponded with carve out a niche where they can remain unchallenged, a territory of keyword density, meta tags, link depth, link penalties and link juice shaping, the application of ill-defined theories which are unproven (in some cases, disproven) and which they can continue to charge for as they tweak in response to the latest webstats.

The assertion that SEO is a racket can be easily substantiated. If website owners could, by invoking SEO voodoo, position themselves arbitrarily highly in the natural listings of search engines, then the search results would be determined by website owners as a function of time and money. The usefulness of the search would quickly degenerate and users would migrate to other search engines who provide better quality results. Therefore, search engines would not make as much money from sponsored links. Search engines like making money from sponsored links, so they won't allow this to happen.

This isn't some abstract scenario I've imagined. It actually happened in the late 1990's to the search engine Altavista. Altavista's search results had become a free-for-all and it haemorraged users, primarily to Google, whose search results were vastly superior and clean of link farms. I watched it happen; in fact I was one of Altavista's users who switched to Google.

The one thing we know for certain about the ranking systems of search engines is that they are extremely complex and closely guarded secrets. They don't have to be scrutable or even produce optimal results: they merely need to produce good results - which implies being hardened against exploitation.

Scheduling Events

August 9th, 2007

There are several situations in web application programming where it is necessary to schedule events to happen in the future, outside of the request driven model. Some of the most common are these:

  • Expiring static files from the webserver. Some data can be cleaned up whenever a page is requested. On occasion, though, the application establishes the contract that a file will stay around for a fixed period of time. When access to these files is provided by the webserver (not through the application itself) then the files need to be deleted at a given future moment.
  • Time-based notifications. For example, if you deal with dates and times in your web application it's sometimes necessary to actually notify users (most often, via email) at a given time. It's clearly not acceptable to wait until someone hits a page (possibly hours or days later) to issue these notifications.
  • Syndication. Polling data on remote servers has to be done ready for when a user hits a page, because otherwise it can introduce an unacceptable delay while variously contactable remote hosts are queried.

In several of my web applications now I've come to a sticking point when it comes to scheduling events. As far as I know this is always left up to the developer to arrange. Scheduling events is considered outside their remit.

There are a few solutions I know of.

  • The application can provide a script which the administrator must schedule to be run periodically at install time. Drupal, for example, recommends adding a crontab entry which periodically wgets a script on the web site. In redistributable apps, many users will obliviously skip this step and wonder why the application won't work.
  • Run scheduled tasks after serving each page. This approach doesn't solve the above problems. In mod_php/perl/python applications this hogs a webserver thread too, which could degrade performance.
  • There are websites like webcron.org that will fetch a script on your server at intervals. It would be madness to rely on this in your own applications or suggest this as a solution for a redistributable applications, so it's only suitable as a fallback if all else fails.
  • The application may be able to use to the system scheduler (cron/at on Posix, Windows Scheduler Service on Windows). While it should be possible for a PHP application to enqueue things into the webserver's user's crontab (as long as PHP isn't restricted to "safe mode"), I'm not sure that this is advisable. Most offline applications I know that need to schedule something spawn their own daemon to handle scheduled events, even if it sits idle most of the time.

I can't see why the frameworks shouldn't provide an API for scheduling tasks. This would have the advantage of being simple, integrated and portable, and it could negotiate to use the platform scheduler or fall back to spawning a daemon to dispatch events.

Google Maps Routing

August 3rd, 2007

You can now dynamically reroute directions provided by Google maps by dragging the computed route!

This is unbelieveably cool stuff - it works like a desktop app, flawlessly and fast, and it's powered by AJAX alone.

PHP4 is dead. Long live PHP4!

August 3rd, 2007

PHP4 is apparently going to be supported only until the end of the year. The idea is to push developers towards PHP5. Matt Mullenweg notes that PHP4 is adequate for a lot of developers, but also claims that PHP5 adoption is poor because PHP5 hasn't been marketed properly to developers. I don't believe this. PHP5 is patently a better language, resolving the single most dreadful language problem that PHP4 exhibits: object copying on reference. Expert developers know this, amateur developers are unaware of the problem and use PHP5 with a kind of religious zealotry.

This migration poses a particular challenge. Rarely does a language change so drastically without offering a simple migration strategy. Vast amounts of legacy code simply don't work on PHP5. mod_php4 and mod_php5 don't run in the same Apache instance so it's not trivial to configure a box to serve some sites with PHP4 and some with PHP5. There is no solution that does not require a lot of sysadmin work setting up proxies, or even virtual machines, and of course, this is not the kind of thing distros do out of the box.

PHP developers seem lacking in conscientiousness about the community they are supporting, as attested by this quote from the PHP4 to 5 migration documentation:

Many PHP programmers aren't even aware of the copying quirks of the old object model and, therefore, the majority of PHP applications will work out of the box, or with very few modifications.

In fact, that entire subsection of the documentation carries the tone that compatibility problems are inconsequential. I think it's telling that the PHP documentation can't produce a complete list of reserved keywords. As the user-submitted comments note, there are at least half a dozen missing from the list.

I've also discovered that PHP4 is simply not available for Ubuntu Feisty. While I can understand that there is a genuine desire to move the PHPosphere forward, it's incredibly dumb to gauge whether people are ready to ditch PHP4 by looking at supported, off-the-shelf web applications, rather than considering the volume of cheap legacy applications. Many people simply need both.

For myself, I'm happy to carry out the migration, but it's annoying that it's been handled so badly that my job in doing so is so very much harder. Hard enough that I've already put it off for years.