Current version: 1.2.15 (29 March 2021) [src]
Quick Spam Filter (QSF) is an email classification filter, designed to be small, fast, and accurate, which works to classify incoming email as either spam or non-spam.
To recognise spam, QSF strips the text out of the email (using MIME decoding and HTML stripping) and then splits it into tokens (words, word pairs, URLs, and so on). These tokens are then looked up in a database and analysed using the Bayesian technique to see whether the email should be classified as spam or not.
The database is generated by a process of training - QSF is given two mailboxes, one containing known spam, and the other containing known non-spam, to train itself on. After training, if QSF misfiles any email, the message it got wrong can be fed back into the database, thus making QSF learn from its mistakes.
For a more in-depth look at the way in which QSF tokenises and classifies messages, please see the Technical Details section of the manual.
QSF is designed to be run by an
MDA, such as
procmail
. See the FAQ for a quick-start
guide.
This software is distributed under the terms of the Artistic License 2.0.
Recent Changes
- 1.2.15 - 29 March 2021
-
- bugfix: correct exit status of "
qsf -t
" to 0 if memory limit exceeded (Zhengdao Wang) - bugfix: correct compiler warnings and
toupper()
misfire (Dr. David Alan Gilbert) - bugfix: report error if "
qsf -T
" is pointed at directories (Iain Calder) - cleanup: clean up compiler warnings related to unused vars and type mismatches
- bugfix: correct exit status of "
- 1.2.11 - 3 January 2015
-
- bugfix: Debian #773546 - report error on malformed message (Jameson Graef Rollins)
- bugfix: Debian #651881 - X-Spam-Level corruption on non-ASCII spam (Ian Zimmerman)
- bugfix: MD5Final now correctly clears context (patch from David Binderman)
- bugfix: removed "DESTDIR" / suffix to fix Cygwin installation
- cleanup: mailbox code consolidated into single file
- cleanup: moved acknowledgements out of manual page
- cleanup: better "rpm" and "srpm" build targets
- 1.2.7 - 28 August 2007
-
- license change to Artistic 2.0
- pointless option "
-l
" removed
To Do
Things still to do:
- weight as personal if 1 database, use -d/-g for weight if more than 1 (Pavel Kolar)
- strip \r, re-add afterwards, if first line is \r\n (Nora Etukudo)
- autotrain option "-u", retrains based on classification
- try stripping double/triple letters, eg "fffinancce" → "finance"
- token for nonsense META tags (META NAME="blah blah blah")
- support MH/Maildir training folders
- environment variable for -d and -g
- more verbosity, with profiling data
- comma-separated training folders (-T spam1,spam2,spam3 nonspam1,2,3)
- generate a "%age new tokens" score (eg 90% new tokens)
- support for SQLite v3
- allow MySQL Unix socket connections (socket location configurable)
- remove pruning step from training if using 3-column database
- improve efficiency of token deletion from list backend
Any assistance would be appreciated.