Timber: A New Mail User Agent

This is a page about Timber, a new mail user agent which I'm currently in the process of writing.

History

(or, Why Yet Another MUA?)

I'm working on Timber for the usual reason people want to write yet another of something: none of the existing MUAs quite do what I want.

Three or so years ago, I finally got sick of the MUA I had been using up until then, Pine, for the simple reason that I'd become a very proficient user of the Emacs-like text editor Jed and I didn't like having to downgrade my editing reflexes when I was writing e-mail. I wanted an MUA embedded in Jed, so I'd have the same editing environment, and better still I'd be able to dash off a quick e-mail in the middle of a long editing session.

Unfortunately, there weren't any MUAs written in Jed, or at least none that were much good; so I wrote my own. It ran mostly inside Jed, with a few Perl scripts to help it do system-access things. Its usage model was largely derived from Pine, because that was what I was used to: you had an Inbox, and some Folders, and when you'd read each mail you could save it into a Folder.

I called it Timber, whimsically, because it heralded the fall of Pine. (From my own perspective, at least.)

Timber v0 never made it out into the world. It's not production quality software, and more sort of symbiosisware: it has a lot of bits missing, and when I run across a missing bit, I decide whether it's easier to implement it or to live without it.

Three years on, I'm getting heartily sick of Timber v0 for a variety of reasons. Partly, it's far too slow (being written in the interpreted language S-Lang), and partly I'm getting annoyed with the underlying folder model. It's time for a rewrite, in a proper fast compiled language. Timber v2.

(Yes, I'm going to go straight from v0 to v2. So what?)

Features

Timber v2 will feature two big design concepts which I keep getting annoyed at the lack of in Timber v0, and in Pine, and in pretty much anything else I've seen. They are: threading and categories.

Recently I've been trying (with notable lack of success) to plough through a backlog of several hundred e-mails which I've built up. Many of those mails are sent to the PuTTY maintainer alias, which means they've been seen by the other members of the PuTTY core team. Quite a few of them have already been answered by other team members. So working through the backlog from the beginning is a confusing job, because I might easily reply to a message and then realise somebody else has replied to it already, a lot further down the folder, and I hadn't noticed. A threaded mailer would solve this, because each message would automatically be grouped together with any existing replies.

The other big thing, categories, is a concept designed to replace folders. Folders are a means of storing mail: you move your message out of the Inbox, into a folder, and it stays in that folder. Any mail not in a folder doesn't exist at all. In Timber, all mail will be stored in one large heap, and categories will be a means of indexing the mail. So a message can be in multiple categories, without actually storing multiple copies of the message; and conversely, a message can be in no categories at all and still exist. So when I get a message from John about PuTTY, I don't have to decide whether to file it under "John" or under "PuTTY", and when I come back to look at it in the future, I don't have to try and remember which one I chose. I can file it under both. No more problem.

Filing messages in multiple categories sounds like a lot of work for the user: if it's a pain trying to think of one good place to file each message you get, it must be much worse to try to think of several! To try to alleviate that, there will be some degree of automatic categorisation. For one thing, every person listed in your address book will automatically have an associated category - so you can immediately see every mail you have on file which was sent to or from that person. In addition, there will be a configurable mechanism whereby you can automatically assign categories to incoming messages by recognising things about them. For example, messages with a List-Id: header describing a particular mailing list might automatically be filed in the category for that mailing list. (And note that mailing lists can be read almost exactly like newsgroups, because Timber will be threaded.)

Other features I plan to add include:

Architecture

In order to embed this MUA in my editor but still have it run fast, I'm writing a C back end which speaks a simple text protocol. Then I'll write a front end in S-Lang, which will invoke the Timber back end every time it needs to perform an operation on the mail store. So I'll have all the speed of the C back end, with all the convenience of the Jed front end. And better still, it'll be possible to write alternative front ends just as easily: Timber embedded in Emacs, Timber as a stand-alone Unix console application, Timber as an X or GTK application, and so on. In particular, one person is already interested in writing a GUI front end for Mac OS X.

It will also be possible to invoke the back end directly from the command line; and in fact it's entirely possible that I won't bother to implement some of the less commonly used functions of Timber in the Jed front end at all, so they'll have to be invoked from the command line. Examples include retrieving a specific message, or a whole category, and exporting it in mbox format; importing mail from an external mbox; performing full-text searches over all or part of your mail; and other useful things. In particular, the command-line utility will also support hands-off sending of mail; you'll be able to use it in scripts, in place of mailx or /usr/lib/sendmail, and have a copy of the message automatically filed in the database and indexed in a given set of categories.

The mail storage architecture comes in two parts. Firstly, a "mail store", which is where actual messages are held; and secondly, an index database which stores all the information about categories, all the details of senders and recipients and message IDs, the subjects and dates of the messages, and anything else to which efficient access is required. The idea is that when generating an index of messages to be displayed on the screen, only the information in the index database should be needed, and the MUA should not have to retrieve the full text of any actual message until the user specifically says they want to read that particular message.

I had initially planned to have the index database store the full text of the messages as well as index data, but decided this was a silly idea for two major reasons. Firstly, if the mail store is in a simple text format (in the initial version of Timber it will be a set of text files in the ordinary Unix mbox format), this guarantees that if the index database becomes corrupt and unusable, you can still recover all your mail and shovel it into another MUA without too much difficulty. Secondly, separating the mail store from the index opens up the possibility of changing the mail store at a later date; in particular, one thing that might be useful is to support IMAP as the mail store while retaining the index database. (This would lose the major advantage of IMAP that you can access your mail from anywhere; but for some IMAP users this isn't a particular concern, and if a site policy forces them to store their mail on an IMAP server then they might appreciate Timber's use of a locally stored index to speed up their mail operations and keep IMAP requests to a minimum.)

Status

Timber v2 was nothing but vapourware for some time. My requirement for a database was the sticking point: most database software I'm aware of follows a client/server model, where the database server must be installed systemwide and the sysadmin needs to intervene to allow a given user to put any data in it. This seemed like a silly requirement for an MUA: if Timber were to see widespread uptake, then on a large multi-user Unix system the sysadmin might have to individually set up a database area for many of the users to each run Timber in.

What I really needed was a database which was implemented as a library: something I could link into a user-level process and have it give me efficient data storage, locking and transactions simply by accessing a disk file and using file-locking system calls, with no need to ask the sysadmin to do anything at all. I spent a lot of planning time on trying to figure out how to write one from scratch, but without any database coding experience it looked like being a long job and I never started it.

Late in 2003, someone brought SQLite to my attention. This appeared, both at first sight and on closer inspection, to be exactly what I'd been looking for. I played with it a bit and nothing made me doubt this; so the major obstacle to Timber development was removed and I got started in earnest.

As I write this in January 2004, the bottom layers of Timber - mail storage and retrieval, parsing and indexing, character-set translation, and encoding of messages for sending - are well under way. The high-level layers such as threading and categories, address book management, and non-command-line front ends, have yet to get started.


(comments to anakin@pobox.com)
(thanks to chiark for hosting this page)
(last modified on Sun May 7 14:33:22 2017)