Howto to spam-protect your python-based blog with bayesian filter.

As severall people, I run into issue with some spammer using my comment system to spam, and post backlinks. (Even using some funny stuffs)
I ‘m already using a good email spam filter: SpamBayes, so I decided to test bayesian filtering for the spam on this blog too.
I decided to give Reverend a try:
from reverend.thomas import Bayes

SPAM_DB='spam.bayes'
guesser = Bayes()

# load the spam DB
try:
    guesser.load(SPAM_DB)
except IOError:
    print "Creating a new spam filter database"
    guesser.save(SPAM_DB)

def train_spam(text):
    guesser.train('spam',text)
    guesser.save(SPAM_DB)

def train_ham(text):
    guesser.train('ham',text)
    guesser.save(SPAM_DB)

# try to guess the spam / ham ratio of a text
def guess(text):
    spam = 0
    ham = 0
    value = guesser.guess(text)
    for o in value:
        if o[0] == 'ham': ham = o[1]
        if o[0] == 'spam': spam = o[1]
    return (ham,spam)
Small, and really simple module no ? The next step, simply add a ’spam’ and ‘ham’ attributes on your comment post. And add two methods to train the comment as a spam or a ham.. And of course, only display comments which have a good ratio ( >1) ham/spam. This took me about 1 hour to implement…


View the Original article