Archive for the ‘Web Apps’ Category

Scheduling Events

Thursday, August 9th, 2007

There are several situations in web application programming where it is necessary to schedule events to happen in the future, outside of the request driven model. Some of the most common are these:

  • Expiring static files from the webserver. Some data can be cleaned up whenever a page is requested. On occasion, though, the application establishes the contract that a file will stay around for a fixed period of time. When access to these files is provided by the webserver (not through the application itself) then the files need to be deleted at a given future moment.
  • Time-based notifications. For example, if you deal with dates and times in your web application it’s sometimes necessary to actually notify users (most often, via email) at a given time. It’s clearly not acceptable to wait until someone hits a page (possibly hours or days later) to issue these notifications.
  • Syndication. Polling data on remote servers has to be done ready for when a user hits a page, because otherwise it can introduce an unacceptable delay while variously contactable remote hosts are queried.

In several of my web applications now I’ve come to a sticking point when it comes to scheduling events. As far as I know this is always left up to the developer to arrange. Scheduling events is considered outside their remit.

There are a few solutions I know of.

  • The application can provide a script which the administrator must schedule to be run periodically at install time. Drupal, for example, recommends adding a crontab entry which periodically wgets a script on the web site. In redistributable apps, many users will obliviously skip this step and wonder why the application won’t work.
  • Run scheduled tasks after serving each page. This approach doesn’t solve the above problems. In mod_php/perl/python applications this hogs a webserver thread too, which could degrade performance.
  • There are websites like webcron.org that will fetch a script on your server at intervals. It would be madness to rely on this in your own applications or suggest this as a solution for a redistributable applications, so it’s only suitable as a fallback if all else fails.
  • The application may be able to use to the system scheduler (cron/at on Posix, Windows Scheduler Service on Windows). While it should be possible for a PHP application to enqueue things into the webserver’s user’s crontab (as long as PHP isn’t restricted to “safe mode”), I’m not sure that this is advisable. Most offline applications I know that need to schedule something spawn their own daemon to handle scheduled events, even if it sits idle most of the time.

I can’t see why the frameworks shouldn’t provide an API for scheduling tasks. This would have the advantage of being simple, integrated and portable, and it could negotiate to use the platform scheduler or fall back to spawning a daemon to dispatch events.

File uploads

Tuesday, January 23rd, 2007

I have mentioned briefly work that I was doing to wrap file uploading in AJAX for a proper experience. Browser-based file uploads have been downtrodden over the past few years.

In client terms, file uploads work in almost exactly the same way as they have always done: the page blocks while the data is posted, and a very small progress bar shows up in the status bar. This is a user interface disaster for big files.

On the server side, the situation is more varied, but there is often little support for streaming of file uploads. In PHP, file uploads are read wholly into memory, parsed and saved out to a temporary folder before a script even gets called. The request must fit within both PHP’s file upload size limit and its memory limit. As far as I can tell, something similar happens in Zope although you can argue that Zope allows other standards for upload such as DAV and FTP natively. In plain CGI, of course, there is no handling of the uploads, so if you’re using a CGI wrapper, it can do whatever you want to handle this. Perl’s CGI.pm module allows a hook, at least. Python’s cgi module doesn’t, nor is it easy to subclass.

All in all, the situation of binding file uploads to form submissions, and processing of those in common server-side languages is wholly inadequate as file size gets large. File uploads are convenient because they are a commonly-supported fall-back, but the workarounds, although solving some of these problems, don’t have the simplicity of a browser-native solution.

In my recent project I looked at ways of working around these limitations. The best workaround for the client-side problems I have found so far is to perform the upload in an <iframe>, using AJAX queries to present a progress bar. This still has problems, notably that it’s one file at a time, both on the choosing and the uploading. In Firefox I can actually perform two concurrent uploads in different <iframes>, but the AJAX progress bar doesn’t then update.

Server-side, I wrote the whole thing as a webserver so that the AJAX queries could talk directly to the thread streaming the upload. Additionally I wrote my own parser to parse on-the-fly the data uploaded, so that the daemon knows what is uploading at any given stage. It works quite well, and the system is extensible in that it could combine a daemon that allows other forms of upload; feedback for these would also appear in the browser windows.

Even so, I wish that file uploading was something people were thinking about more. It’s central to so many web applications now.

There are numerous problems:

  • File uploads are synchronous. Downloads can happen in the background in their own, but uploads can’t.
  • File uploads don’t have a proper UI. Current browsers appear to show a tiny upload bar that isn’t really very accurate and doesn’t give data rates or estimated time remaining.
  • Uploads are chosen one at a time.
  • Javascript can’t be used for polish. The model that has empowered Web 2.0 improvements is that of taking an existing HTML/HTTP model and allowing it to be controlled by Javascript. However, there is no way into the uploading or the file selection processes with Javascript.

The most general solution I can see would provide a Javascript API for uploading. This would allow Javascript to show a (native) file chooser dialog, and instruct the browser on what to do with the files it returns. POST or PUT to the origin server seem useful, as does FTP upload. Clearly there are security concerns, but I fail to see how, as long as Javascript may instigate an operation, read upload statistics, but not read the filesystem, this presents a problem.

Perhaps an AJAX-style API could be along these lines:

//configure a native dialog to present to the user
var ufc=new UploadFileChooser();
ufc.setAcceptableFileTypes(['image/jpeg', 'image/png']);
var uploads=ufc.chooseFiles();

for each (var u in uploads)
{
u.onreadystatechange=doSomething; //callback
//this URL is constrained the origin server to prevent XSS
u.beginHttpPost('http://example.com/upload');
}

After this, the user could close the tab or leave the page, and the browser would upload the files in the background, perhaps with a progress bar appearing within the Downloads window. Note that it could queue the files rather than uploading them all at once, depending on user settings. The Javascript, and indeed the user, should be able to request that an upload is aborted. The Javascript should also be able to query the upload, using the object reference provided.

Scriptable Interfaces Revisited

Friday, November 3rd, 2006

I solved my problem mentioned in my last post by creating my own scriptable interface in Python.

Happily, I had decoded enough of the database format to be able to write a class - about 50 lines of Python - which encapsulates the configuration I needed to modify, and the ability to store it into the database (I didn’t fully implement reading the old configuration from the database though).

Then I was able to do the reconfiguration as simple object instantiation and member function calls. As expected, it was much easier than logging in as administrator for each CMS in turn, going through the interface to find the configuration page required, and updating and saving it.

There were a number of advantages:

  • All the configuration to be input was collected at the bottom of the script, together on one page, so it was easy to check that everything was present and correct.
  • The script can be re-run at any point, so it’s maintainable.
  • The script performs some checks to ensure the new configuration is valid.

Obviously, it would be easier if Joomla offered this kind of capability so it didn’t have to be maintained separately in Python. Joomla is written in PHP, but that doesn’t mean you can just write a PHP script that loads up Joomla’s classes and tweaks them as I’ve described. PHP doesn’t have import semantics, so you need knowledge about what has to be included and in what order. It also doesn’t have a very strong object model so trying to make object manipulations correspond to data manipulations is inherently inconvenient. Finally, Joomla’s PHP objects are comprised, as far as I’ve examined, of monolithic functions that do a lot of procedural steps rather than small, reusable functions.

So I’m one step closer to defining exactly what I need here for administering web applications. I need to be able to write administration scripts quickly, without prior knowledge of the application’s internals (ideally using interactive introspection to obtain the knowledge I need).

  1. An API which allows me to succinctly retrieve and manipulate the application’s data. This should be object-oriented so that for any object I’m handed, I know what operations I can perform with it. The structure of the object model should reflect the way the data is presented in the web app’s front end, so that I go blindly using what I see of the web app’s data model.
  2. A very low overhead in terms of lines of code for getting up and running. I don’t want to learn code by rote just so I can bootstrap this API.
  3. Implicit or succinct persistence for the objects I’ve retrieved. I can call store() on each object I modify, if I must, but implicit persistence (ie. everything is automatically updated when the script ends) will better allow the API to handle ACID for me.
  4. No SQL queries. I don’t want to have to understand the database structure. I also don’t care about efficiency in this instance so just hand me a list of all the objects and I’ll filter it myself.
  5. No using built-in types to represent abstract data. For example, no using associative arrays to represent objects. A class for everything, with useful member functions, please. I don’t want to have to work out the semantics or write any code that might already exist in the application.
  6. A few CLI utilities that use this API. Not only can these perform useful tasks, but they prove the API works, is succinct, and gives examples of it in use that I can copy.

Note that the API doesn’t need to remain backwardly compatible. I’m happy to modify my management scripts. The aim is for the scripts to be as brief as possible, so it shouldn’t be hard to tweak them if the data model is changed (improved, we hope).

Web apps need scriptable interfaces

Wednesday, November 1st, 2006

I was just working on a set of separate Joomla installations for a client today when I realised that I really needed to be able to run scripts against the different installations.

I was trying to install three different Mambots (one of Joomla’s three different types of extensions) in about 8 installations of Joomla - each with different database configurations and paths, and having started out with a Bash script to merely copy the plugin files into place, I realised that because automating the whole operation would involve reading a configuration file in PHP syntax and performing some queries in MySQL with it, coding this would probably take longer than installing the plugins manually.

There are not very many web apps which have any kind of scriptable API. In fact, I only really know of Mailman, which is only partly a web application. But it’s a feature I’ve used frequently in Mailman - there is a script bin/withlist which acquires locks and opens the list, allows you to modify the list as a Python object, and saves it on exit. Mailman provides a few CLI tools too which can be used in scripting but which are really only trivial examples of the power of the scriptable API.

When I began writing Mailhammer, my own announcement-only mailing list software, I took this scriptability even futher based on my positive experience with Mailman’s scriptable API. All of the working parts are implemented in Python, and the PHP is just an HTML wrapper which opens and talks to a CLI Python script over pipes. This means that the PHP is kept extremely simple, and the Python core is a very clean and simple API, and that the CLI can do everything reliably. It’s a cleanly divided implementation of an n-tier architecture. In fact in practice, I only use the web interface for viewing the data already in the database. Consequently, that interface isn’t very powerful - yet!

Python is well-suite for scriptable APIs - its interactive interpreter and neat object model mean that it’s easy to perform arbitrary operations interactively on complex, persistent data structures. In PHP web applications it might be more feasible to build an XML-RPC interface of some kind and provide a command-line client.

I don’t think that scriptability is considered as even a potential feature for almost any web application I’ve tried; their operation is tied inextricably to their unique interfaces.

For anybody developing a new web application please ask yourself this: will administrators using your software want to be locked in to your pretty and easy-to-use interface, or will they end up cursing you for failing to provide them with power beyond what HTML can provide?

Picasa Web Albums

Thursday, September 21st, 2006

Well what do you know… what with all my recent work with client-side scripting and then today’s look at Mauvesoft Gallery and scanning for photos, I was just thinking about an AJAX-powered web-based Picasa clone when I stumbled across Google’s Picasa Web Albums, which I hadn’t seen before.

I’ve uploaded a few photos to it, and it’s OK I suppose, but it’s not as developed as some of the other Google webapps: it’s just a basic web gallery. It’s better than Gallery (Capital G) because Gallery is a mess these days. Gallery 2.0 is hugely bloated, and for all that bloat it isn’t very much more powerful and certainly not as easy to use as Picasa Web albums.

Still, I think Mauvesoft Gallery is a fairly good web gallery core, and it’s simple and hackable enough that I could probably rig up some AJAX over the top quite easily. It would certainly make for an easy-to-use administration interface, but I was thinking more of the on-the-fly searching, and Picasa’s wonderful ‘Timeline’ feature. I wonder whether the publishing to the world of photos is the main aim when people use web galleries, or is it that they just want access to all their photos wherever they are? (Or a third option: my brother, on holiday, uploads photos from Internet Cafes so that he can delete them from his camera to free up space.)

A fork of Mauvesoft Gallery is used to power the ImageChooser component in my e-Commerce software, and that is desperate for some attention. The UI is appalling at the moment. Perhaps I’ll download Dojo and see if that can speed up the development.

Mauvesoft Gallery

Wednesday, September 20th, 2006

One of my old online friends, Twisted, messaged me this morning to say that he was having problems with his installation of Mauvesoft Gallery. He had reinstalled in on a new Ubuntu box but it was not thumbnailing properly: first it was not generating thumbnails; then, having fixed that, he found that it was not caching them.

Anyway, I tarballed up my unstable development version, 1.5, which adds a few features and fixes a few bugs.

Changelog

  • Feature: PHP-based templates
  • Feature: Watermarking of thumbnails
  • Feature: Images now support EXIF captions and titles
  • Theme: New theme ‘corporate’
  • Theme: ’slides’ rewritten in XHTML and CSS
  • Bug: Thumbnail transparent PNGs with GD
  • Bug: .JPG extensions not considered images
  • Bug: Directory names containing ‘+’ character
  • Bug: Imagemagick engine doesn’t work with CMYK JPEGs

He installed that and after a few permissions bugs, it’s up and running.

I suppose this makes it almost ready for an RC. The first alpha is already running on a site I did a couple of months ago for a client, Photography2you.

When I actually package this for release I’m going to use shar, which I am fairly confident I can use to configure the installation after it has unpacked. All webapps suffer from this installation problem, and there doesn’t seem to be a generic solution for installing them, even though there is a very limited range of things that need to be configured to get them off the ground on a single-vhost basis. I can’t imagine that it would be hard to write a package manager for them. Mauvesoft Gallery works on Windows and even IIS too I believe (although I’ve not tested it recently), but Windows is much more lax on the permissions (although I’ve not tried Server 2003), so a normal ZIP file may suffice.

It also occurs to me that as part of this shell-based installer I could offer the user the option to scan their $HOME and symlink any directory it find containing photos (for some definition of photo… perhaps JPEG image over 1 Megapixel?) into the Albums root. Zero-configuration installs here we come. The goal is to make Mauvesoft Gallery simpler to install and use than any other gallery software.