For quite some time I've been struggling with hundreds of email messages a day, and hundreds-of-thousands of email messages in archives to search. I've used several different email programs to try to handle my mail, (pine, vm, mh-e, wanderlust, evolution, mutt), and experimented with several others, but not until now did I find one that really helps me process mail the way I want to.
Before I introduce sup, though, I want to at least mention in passing a mail indexer named mairix since I've been meaning to write a blog entry about it for several months. Anthony Towns first let me know about mairix earlier this year when I was complaining about email near him. He described his approach to mail as "mutt for reading mail and mairix for searching".
The way mairix works is that it creates an index of a mail archive then provides a command-line-based search that creates results by creating a new maildir, (with just symlinks to original messages if the original archive is also maildir-based).
So mairix can give you a simple way to add fast, full-archive searching to a program like evolution when its builtin search capability is hopelessly inadequate for your large mail archive. The mail client can be configured to look at a single search-results maildir folder, and mairix can be used to create the results there. Do note that mairix is somewhat limited as a mail search engine since it has no proximity or phrase-based searching, (the best you get is whether messages contain given words). But the maildir-in and maildir-out approach it has makes it easy to integrate, and is the kind of concept that can perhaps let us avoid some of the disasters out there in the form of monolithic email clients.
But I've got a much more interesting story in sup. I was just a few days into reinventing my email workflow around mutt when I got into an interesting conversation with Jamey Sharp about what the ideal mail-handling system would look like. Here are some of the ideas we came up with:
Program presents me with messages that need my attention, (a chronological sort of the archive with unread items in bold does not count).
When I decide a message does not need my attention I can make it "go away" into the archive, (and "mark as read" does not count---I want to file away messages but still track whether they are read or not in the archive).
Threads must be first-level objects, (in particular I want a kill-thread command that archives not only the current messages, but all future messages received in a given thread).
System should support arbitrary tags on messages. Tags can be applied automatically, (such as applying categories to mail received on lists), and applied ad-hoc by the user.
System should allow searching over the entire archive with most recent results appearing instantly. Search terms can include tags, message headers, or full-body search (including phrases).
In addition to full-archive search, incremental refinement should be possible, (a new search based on the results of a previous search).
There's no need for folders in this system. Tags and incrementally refined, tag-based searching provide all the benefits, (without the bug in folder-based systems where a user has to hunt among many folders to find one where a search returns non-empty results). In particular, the "inbox" view should just be a search for unread and non-archived messages.
That was basically our list of core features. Beyond that, we also discussed some further ideas for improving the prioritization of email threads to the user. One primary job of the system is to present the user with the most important thread to be read next, (based on tags, etc.). The user supplies feedback by either reading the thread or archiving it, so there's opportunity to apply machine learning to improve the prioritization.
During the course of this conversation, Jamey mentioned that I should go look at sup which he thought might implement some of these features, but he hadn't tried it out yet. I was delighted to find that each one of our core features already exists in sup, (and sup would make a fine platform for research into the learning-based approaches).
It also does a few things I hadn't specified, such as displaying email in a full-thread nested view, (rather than one message at a time), with quoted elements and signatures folded out of view until the user asks to see them. Another nice touch is that the single-line presentation of a thread includes the first names of participants in the thread, (with bold names to indicate unread messages). This provides some of the essential information needed for applying Joey Hess's thread patterns, but without the tree view at this point.
[Note: I have been told that several of the above features are also
implemented in gmail. I've never tried gmail myself, since it fails to
provide some even more fundamental features: 1. offline
usage, (thanks to both Mark and Greg for pointing out that gmail
has offline support in beta via gears) 2. personal ownership of email
storage, 3. free-software implementation for customization.]
In the few days I've been using sup, it's definitely transformed the way I process mail. Keeping my inbox empty is simple now, and I now easily tag messages for specific, future actions without fearing that I will forget, (and without leaving them cluttering up my inbox). Searching through my entire archive is fast---the first results come back instantly and they're generally what I want, (the index is optimized to return the most-recent messages first).
Sup is implemented in ruby, and the author and maintainers on the sup-talk mailing list have been friendly and receptive to me, (in spite of my lack of ruby-clue), so I've already got a patch or two in the upstream git repository for sup. The indexing in sup has been performed by ferret, but is currently switching to xapian.
Sup is not entirely mature yet, (I discovered fairly easy ways to make the 0.8.1 version in Debian crash), but I'm happily running the latest code from git now, and trusting my mail to it. I did find that a bug in a dependent package causes sup to crash when running sup from git while sup is also installed from the Debian package. So I recommend doing "apt-get install sup-mail" to get the dependencies installed, then "apt-get remove sup-mail" (but don't autoremove the dependencies), then run sup from:
git clone git://gitorious.org/sup/mainline.git
I've also posted a short list of some changes I'd like to see in sup, (mostly in the client stuff---the indexing and searching stuff seems just fine).
There's a nice New User's Guide included with sup, but no reference manual yet, so the sup wiki is an essential (and short) substitute for a reference manual for now.
One thing sup does not do yet is that it doesn't separate the mail-indexing/tagging/searching system from the mail client. So if you're not happy with a curses client written in ruby, (and perhaps more significantly, a different client than what you're currently using for composing messages, making attachments, viewing mime stuff, etc.) then you're currently out of luck. Fortunately, William Morgan has plans to reinvent the most interesting aspects of sup as a server process and he recently reported making progress on those plans over the last year. (This isn't too surprising, since why would William want to maintain all that code to do things like dealing with mbox, maildir, attachments, etc. when those are already solved problems for users everywhere.)
And hey, I would love to have a mailing list archiver based on sup's threading. So if anyone wants to cook something like that up, please feel free, (sup is licensed under the GNU GPLv2).
Posted Thu 20 Aug 2009 04:30:00 PM PDT