Peter's Blog 18.8.2004

2004-08-18

Procmail and Spamassassin

I've set up procmail and spamassassin everywhere. For an easy life I have it set up thusly:

  • fetchmail scans POP servers
  • exim processes received mail
  • exim .forward file passes messages to procmail, e.g. (.forward)
pipe "/usr/bin/procmail"
  • procmail filters messages through spamassassin, e.g. (.procmailrc)
:0fw: spamassassin.lock
* < 100000
| spamassassin

# Mails with a score of 15 or higher are almost certainly spam (with 0.05%
# false positives according to rules/STATISTICS.txt). Let's put them in a
# different mbox. (This one is optional.)
:0:
* ^X-Spam-Level: \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*
.Spam.Definitely/

# All mail tagged as spam (eg. with a score higher than the set threshold)
# is moved to "probably-spam".
:0:
* ^X-Spam-Status: Yes
.Spam.Possibly/
  • procmail delivers messages to appropriate maildirs
  • dovecot provides IMAP

There are ways to get exim to live with spamassassin directly, without procmail e.g.

  • hook it directly into the virus scanning hooks of exim.
  • put in a pipe so messages go through exim, out to spamassassin, back into exim via a second process and out through a different rule.

I put procmail into the equation for a few reasons:

  • I find many procmail recipies on the internet, not many exim filter recipies.
  • The exim online documentation is comprehensive but turgid and you have to totally understand how an MTA works to grasp any of it. The procmail documentation I have found is far more accessable.
  • The .forward->procmail->spamassasin is simple and clean.

I haven't bothered setting up a server daemon for spamassassin which would speed up the email handling. Checking each email takes about 15 seconds but this is ok, I can wait that long.

Observations:

  • DNS Block Lists (NOT RBL's) seem to work as DNS servers. Nice for getting through firewalls.
  • Spamassassin takes advantage of Vipuls Razor and Pyzor which are essentially databases of known spam messages, but unfortunately these don't work through firewalls Unsmiley
  • Procmail is nice: I've set it up to do all my delivery filtering.
  • For procmail delivery rules, it is important to remember to put a trailing / if the target is a maildir directory. This has caught me a couple of times.
  • procmail can handle spaces in maildir directory names if you use quotes, e.g.
:0
* ^From:.*Somebody
".Messages from somebody/"
  • Nice being able to test spamassassin from the command line:
spamassassin -D < mailfile
  • It's taken me a few hours to learn that there are four s's in assassin. It seems to take an eternity to type.
posted at 17:23:28    #    comment []    trackback []
 

Gentoo emerge Error

For a while now my emerges have all been reporting an error:

--- !found obj /usr/share/doc/gcc-3.3.2-r5/i686-pc-linux-gnu/libstdc++-v3/html/e
xt/lwg-defects.html
Traceback (most recent call last):
  File "/usr/bin/emerge", line 2605, in ?
    unmerge("clean", ["world"])
  File "/usr/bin/emerge", line 1852, in unmerge
    retval=portage.unmerge(mysplit[0],mysplit[1],portage.root,mysettings,unmerge
_action not in ["clean","prune"])
  File "/usr/lib/portage/pym/portage.py", line 2494, in unmerge
    mylink.unmerge(trimworld=mytrimworld,cleanup=1)
  File "/usr/lib/portage/pym/portage.py", line 5318, in unmerge
    mymd5=perform_md5(obj, calc_prelink=1)
  File "/usr/lib/portage/pym/portage.py", line 2485, in perform_md5
    return perform_checksum(x, calc_prelink)[0]
  File "/usr/lib/portage/pym/portage.py", line 354, in perform_checksum
    return fchksum.fmd5t(filename)
IOError: [Errno 5] Input/output error: '/usr/share/doc/gcc-3.3.2-r5/i686-pc-linu
x-gnu/libstdc++-v3/html/ext/lwg-active.html'

This error occurred at the end of emerge when it was trying to delete old packages so it was not too bad (it had already installed the new stuff).

Looking it detail, it transpired that the file in question, lwg-active.html, indeed is corrupt: catting it gives an IO error as well.

Deleting the file and running 'emerge clean' carried on past this until it found another corrupt file. In all about 20 files gave this error, mostly files called Changelog*.gz in various directories.

Next task is to run 'reiserfsck' but that involves booting from a floppy or CD and mounting the drive read only. fsck gives no errors. Depending on what reiserfsck says do I:

  • change filesystem
  • change hard disk

I cannot live with an unreliable file system. I'd rather use FAT.

posted at 10:59:28    #    comment []    trackback []
August 2004
MoTuWeThFrSaSu
       1
2 3 4 5 6 7 8
9101112131415
16171819202122
23242526272829
3031     
Jul
2004
 Sep
2004

A blog documenting Peter's dabblings with Python, Gentoo Linux and any other cool toys he comes across.

XML-Image Letterimage

© 2004, Peter Wilkinson

Bisi and me