07
Oct
2009
23:35 PM

mutt 1.5.20 and bogofilter 1.2.1: Correcting the Classification Macros

A few recent local changes included a restructuring of inbound mail delivery to include UCE (more frequently called spam) filtering using a combination of the venerable procmail and the increasingly impressive bogofilter.

Integration of bogofilter via procmail is covered in detail across the web, but it usually just a matter of adding a few lines to the top of your .procmailrc similar to:

:0fw
| bogofilter -u -e -p
:0e
{ EXITCODE=75 HOST }
# file the mail to Trash/ maildir if it's spam.
# Use locking to prevent huge simultaneous delivery from causing DOS.
:0:
* ^X-Bogosity: Spam, tests=bogofilter
Trash/

Train the initial bogofilter database with some personal spam and ham corpus files and everything springs to life.

Now, out of choice, mutt is the default MUA here and it presents abundant possibilities for integration with bogofilter to allow correction and reclassification of any messages that produce unexpected results. In fact, the bogofilter man page even contains some macros that should allow this. Having tried these first, they do not seem to work in their current form. The local man page lists the following:

macro index d "<enter-command>unset wait_key0
          <pipe-entry>bogofilter -n0
          <enter-command>set wait_key0
          <delete-message>" "delete message as non-spam"
macro index \ed "<enter-command>unset wait_key0
          <pipe-entry>bogofilter -s0
          <enter-command>set wait_key0
          <delete-message>" "delete message as spam"

Even assuming that those terminating zeroes should probably be \n these macros still have some fairly major problems:

  1. They do not remove incorrect classification before applying the opposite, leaving entries with both spam and ham associations.
  2. They do not work with multiple tagged messages.
  3. They produce, here at least, an empty result set. That is, with -v added to the bogofilter command-lines the result is always: # 0 words, 0 messages

Clearly there is some tuning to be done. A more promising set of macros can be found in Busting Spam with Bogofilter, Procmail and Mutt, Revisited:

macro index s "<enter-command>unset wait_key\n
               <tag-prefix><pipe-entry>bogofilter -MSn\n
               <enter-command>set wait_key\n
               <tag-prefix><save-entry>"
macro pager s "<enter-command>unset wait_key\n
               <pipe-entry>bogofilter -MSn\n
               <enter-command>set wait_key\n
               <save-entry>"
macro index r "<enter-command>unset wait_key\n
               <tag-prefix><pipe-entry>bogofilter -Mn\n
               <enter-command>set wait_key\n
               <tag-prefix><reply>"
macro pager r "<enter-command>unset wait_key\n
               <pipe-entry>bogofilter -Mn\n
               <enter-command>set wait_key\n
               <reply>"
macro index g "<enter-command>unset wait_key\n
               <tag-prefix><pipe-entry>bogofilter -Mn\n
               <enter-command>set wait_key\n
               <tag-prefix><group-reply>"
macro pager g "<enter-command>unset wait_key\n
               <pipe-entry>bogofilter -Mn\n
               <enter-command>set wait_key\n
               <group-reply>"
macro index l "<enter-command>unset wait_key\n
               <tag-prefix><pipe-entry>bogofilter -Mn\n
               <enter-command>set wait_key\n
               <tag-prefix><list-reply>"
macro pager l "<enter-command>unset wait_key\n
               <pipe-entry>bogofilter -Mn\n
               <enter-command>set wait_key\n
               <list-reply>"
macro index X "<enter-command>unset wait_key\n
               <tag-prefix><pipe-entry>bogofilter -MNs\n
               <enter-command>set wait_key\n
               <tag-prefix><delete-message>"
macro pager X "<enter-command>unset wait_key\n
               <pipe-entry>bogofilter -MNs\n
               <enter-command>set wait_key\n
               <delete-message>"

These are a huge improvement, macros defined for the index page are applied to all currently tagged messages, whilst those for the pager apply only to the current message. However, it soon became apparent that these were also not working as expected with bogofilter 1.2.1 — back to the debugging output. It transpired that there were several different problems with this particular setup.

Firstly, the LinuxJournal macros above use <pipe-entry> to send the email text to bogofilter. It seems that in certain cases this does not send the entire message, rather a subset of it. The manual refers to the boolean configuration variable pipe_decode, stating that if it is unset then the whole message is sent, otherwise the message is processed first — headers are weeded and tidied etc. However, this didn’t seem to make any difference here, but changing the <pipe-entry> to <pipe-message> did.

With the whole of the message, or messages, being sent to the pipe a new problem became apparent. Using the corrective macro X above (with <pipe-message>) bogofilter returned:

# 0 words, 0 messages

as diagnostic output. The fix for that particular issue was to split the bogofilter command-line arguments, changing from -MNs to -M -Ns. Merging Classification options with Registration options does not seem to be permitted — although bogofilter does not explicitly complain about it.

Now the pager macros work correctly, but the index macros are still not quite right when applied to multiple tagged messages. bogofilter diagnostic output has now changed to something along the lines of:

# 872 words, 1 message

instead of the expected:

#872 words, 3 messages

for example. Mutt should be piping a standard mbox-formatted file containing the complete tagged-messages, but something is preventing bogofilter from recognising anything but the first; instead treating the whole thing as one long message. This is almost certainly not what is expected.

The mutt configuration option pipe_sep controls how the messages are separated in the mbox-file, but the default option of a newline should be sufficient. The problem lies in the failure of mutt to prepend a From line to each message as expected by bogofilter. Fortunately, with procmail comes another handy utility called formail that can modify and correct message headers as a filter. The index macro for corrective message processing becomes:

macro index X "<enter-command>unset wait_key\n
               <tag-prefix><pipe-message>formail -d -s bogofilter -M -Ns\n
               <enter-command>set wait_key\n
               <tag-prefix><delete-message>"
               "Delete messages and mark as spam"

Almost there. Passing the messages through formail first generates the correct leading From header, but as a side-effect also splits the messages into individual files and launches an instance of bogofilter for each! Diagnostic output becomes similar to:

# 132 words, 1 message # 76 words, 1 message # 342 words, 1 message # 296 words, 1 message

It turns out that the command just needs a subtle change to act in the expected way. The -s parameter to formail can take a command as an argument, in the case above bogofilter, but it is optional. If this argument is omitted then formail concatenates the (now correctly formatted) messages into a single mbox-file on stdout. Inserting a pipe in the appropriate place finally produces the correct behaviour:

# 846 words, 4 messages

That’s a single instance of bogofilter processing multiple tagged messages passed from mutt. The final version of the macros becomes:

# Save message(s) and mark as ham
macro index s "<enter-command>unset wait_key\n
               <tag-prefix><pipe-message>formail -d -s | bogofilter -M -Sn\n
               <enter-command>set wait_key\n
               <tag-prefix><save-message>"
               "Save messages and mark as ham"
macro pager s "<enter-command>unset wait_key\n
               <pipe-message>bogofilter -M -Sn\n
               <enter-command>set wait_key\n
               <save-message>"
               "Save message and mark as ham"

# Reply to message(s) and mark as ham
macro index r "<enter-command>unset wait_key\n
               <tag-prefix><pipe-message>formail -d -s | bogofilter -M -n\n
               <enter-command>set wait_key\n
               <tag-prefix><reply>"
               "Reply to messages and mark as ham"
macro pager r "<enter-command>unset wait_key\n
               <pipe-message>bogofilter -M -n\n
               <enter-command>set wait_key\n
               <reply>"
               "Reply to message and mark as ham"

# Group-reply to message(s) and mark as ham
macro index g "<enter-command>unset wait_key\n
               <tag-prefix><pipe-message>formail -d -s | bogofilter -M -n\n
               <enter-command>set wait_key\n
               <tag-prefix><group-reply>"
               "Group-reply to messages and mark as ham"
macro pager g "<enter-command>unset wait_key\n
               <pipe-message>bogofilter -M -n\n
               <enter-command>set wait_key\n
               <group-reply>"
               "Group-reply to message and mark as ham"

# List-reply to message(s) and mark as ham
macro index l "<enter-command>unset wait_key\n
               <tag-prefix><pipe-message>formail -d -s | bogofilter -M -n\n
               <enter-command>set wait_key\n
               <tag-prefix><list-reply>"
               "List-reply to messages and mark as ham"
macro pager l "<enter-command>unset wait_key\n
               <pipe-message>bogofilter -M -n\n
               <enter-command>set wait_key\n
               <list-reply>"
               "List-reply to message and mark as ham"

# Delete message(s) and mark as spam
# To remove statistics line and keypress after command, change first command
# to 'unset wait_key' and remove the '-v -D' arguments to bogofilter
macro index X "<enter-command>set wait_key\n
               <tag-prefix><pipe-message>formail -d -s | bogofilter -M -Ns -v -D\n
               <enter-command>set wait_key\n
               <tag-prefix><delete-message>"
               "Delete messages and mark as spam"
macro pager X "<enter-command>set wait_key\n
               <pipe-message>bogofilter -M -Ns -v -D\n
               <enter-command>set wait_key\n
               <delete-message>"
               "Delete message and mark as spam"


You may also like