Archive for January, 2009

Profanity

Wednesday, January 28th, 2009

The web has never responded very well to censorship. So much of the web is about freedom of expression that whenever someone tries to express himself, and is prevented from doing so, he feels disenfranchised. That applies even more so in the case of the Scunthorpe problem, because people who weren't trying to swear in the first place feel much more aggrieved.

On the other hand, website owners do not want their image damaged by users who can't keep their potty mouths shut.

When developing sites that allow users a voice, we need to find ways to protect the website owners, or the atmosphere of a community, without damaging the goodwill of the user base. Any website that depends on user input, and which doesn't have any users, is a failure.

Profanity filtering is not the answer because, at least, I've never seen it done well enough to be both comprehensive and unintrusive. All problems that relate to processing natural language are extremely complicated. We have barely started to scrape the surface in terms of parsing English text, let alone extracting the semantics from it that we would need to determine if a word is offensive. So any attempt at a naïve profanity filter is doomed to failure. For example, you can be profane without being offensive:

She turned round and screamed, "Fuck off, you stuck-up bitch". I was appalled!

You're a grumpy old bastard, but I love you.

and you can be offensive without being profane:

I did your mum last night. She's fatter than a blue whale, but she knows a trick or two. Your sister does too actually.

and let's not forget the cases where you can't tell:

Do you have a cock or do you just keep hens? Oh, we have a big gold cock. You know, the pussy is afraid of him!

Ok, the last example is contrived and of course nobody would type it with a straight face.  Still, in the right context, it's innuendo not profanity.

With those insurmountable problems, there's simply no substitute for a human keeping an eye on things. However, even with moderation, there are problems to face. Exactly what is acceptable? Moderators can easily pronounce on clear-cut cases of abusiveness or offensiveness, but people have different sensibilities as to what's acceptable. It's also fairly easy for moderators to miss the odd bit of abuse, especially if it's only offensive in some contexts.

One trick to help keep control of the situation is to carefully set the tone. If you can use the language and style of the website to convey a sense of what might be appropriate, you can influence the tone users are likely to take. Though moderators still have to check the same amount of content, this reduces the chance that something untoward will slip through. Phrases like "Interglobal Inc do not take any responsibility for the content of this service"  – phrases which are of dubious merit anyway – may have the opposite effect, by giving users the impressi0n that they don't care what the tone is. You also stand to lose control of the tone in the subconscious minds of users if you use some well-known software – phpBB for example – which users might have used elsewhere and come to associate with a certain mode of speech.

If you do censor people,  a light touch is often better than a heavy hand.

Mauvesoft

Monday, January 26th, 2009

I've overhauled Mauvesoft, my programming projects website. Check it out.

How to program a calendar

Monday, January 26th, 2009

Programming a calendar sounds deceptively easy. And it is, until you come to realise that there's very little point in displaying a calendar that doesn't show information about events and periods. You have a potentially overlapping set of periods to display, each spanning days or months. It becomes much more complicated.

At the moment I'm programming a calendar for the booking of accommodation, which is particularly complicated because a) you book nights, not days, and month planners have cells for days, not nights, and b) the dates that are available are the dates not booked, not the dates booked.

I'm using a simpler approach, converting all calendar periods into a stream of events in date order. The interface between producers and consumers of calendar events looks like this:

class CalendarListener(object):
  def start_month(self, month):
    """Called before the first day of the month, and before any periods in that month."""
   
  def end_month(self, month):
    """Called after the last day of the month, and after any periods in that month."""
   
  def start_day(self, date):
    """Called once for each day to display"""
   
  def start_period(self, date, period):
    """Called before the day in which the period begins"""
   
  def end_period(self, date, period):
    """Ends the previously started period"""

This interface makes it very easy to produce, filter, and consume calendar data. What was previously a complicated process of intersecting, splitting, joining, structuring and outputting date ranges suddenly becomes very simple. All of the events received via this interface are guaranteed to be in chronological order, so no date comparison is needed. Almost all calendar operations can be performed with a simple state machine.

A consumer that renders to HTML, for example, is as simple as this:

class MonthRenderer(CalendarListener):
  def __init__(self):
    self.buf = StringIO()
   
  def start_month(self, month):
    print >>self.buf, """<div class="month"><h4>%s</h4>
      <img class="
week" src="/assets/cal/week.png" alt=""/>""" % month.name()
   
    w = month.first_day().weekday()
    if w:
      print >>self.buf, '<div class="padding" style="width: %dpx"></div>' % (w * 21)
 
  def end_month(self, month):
    print >>self.buf, "</div>"
   
  def start_day(self, date):
    print >>self.buf, '<span class="day">%d</span>' % date.day

(Note: date and datetime are standard Python classes. Month, however, is my own class. Also, some people use a table rather than CSS for this; that's obviously a fairly simple alteration.)

It took me quite a few false starts before I realised the relative simplicity and convenience of this pattern, which is why I wanted to recommend this. It's very easy to fall into a trap of building complexity and tackling problems using ever-more complicated calendar classes and processors and never take the step back to find a better approach.

The naïve approach for programming a calendar is to write a function, say, print_month() which renders a month of a calendar. Then call this 12 times. Then wrap it up in a class so you can subclass it to retrieve a list of events and modify output. This quickly became excessively complicated, as I wrote methods to chop and join periods together, work out what the formatting of each day should be, and render it.

Alas, the calendar also requires Javascript, and doesn't benefit quite as much from an event-driven approach because it needs to operate on the structured HTML DOM.

Tip: Don't use uppercase/lowercase in HTML

Wednesday, January 14th, 2009

It's sometimes tempting to use case for emphasis: uppercase and lowercase are well within the repertoire of useful graphic design tools. Graphic designers know that uppercase is slower to read than lower case, but in isolated phrases that's unimportant. But on the web there's a penalty to using just upper- or lowercase: it's not as accessible. Writing in normal sentence case conveys information. Specifically, the semantics of the sentence – particularly abbreviations – depend on the use of case, as this photo shows:

NUT CONFERENCE

NUT CONFERENCE

CSS provides a way around this: the text-transform property. This allows you to write your content in full-sentence case, and display it in full uppercase or lowercase as desired for stylistic reasons. For example, if your design calls for <h2> tags to be in uppercase, use

h2 {
text-transform: uppercase;
}

Of course, this allows you to simply remove the property if you change your site design; no content needs to be rewritten.

Some offenders even publish an RSS feed using uppercase titles. Never do this. People who want to syndicate your feed normally want it in sentence case, and there's no way to force that to happen if you aren't publishing the RSS feed using proper sentence case.

Wordpress Audio Player

Friday, January 9th, 2009

Martin Laine's Wordpress Audio Player seems to have quite a broad penetration, but having seen it in a couple of places, I want to add that I think it's an excellent. When not playing, it's a plain, unintrusive icon that clearly indicates an option to play a sound, and which smoothly expands to a straightforward, clutter-free player. By changing the colour scheme, you could make this fit with nearly any website style, and unlike many alternatives it will not draw attention away from your text or audio content.