As severall people, I run into issue with some spammer using my comment system to spam, and post backlinks. (Even using some funny stuffs)
I ‘m already using a good email spam filter: SpamBayes, so I decided to test bayesian filtering for the spam on this blog too.
I decided to give Reverend a try:
from reverend.thomas import Bayes
SPAM_DB='spam.bayes'
guesser = Bayes()
# load the spam DB
try:
guesser.load(SPAM_DB)
except IOError:
print "Creating a new spam filter database"
guesser.save(SPAM_DB)
def train_spam(text):
guesser.train('spam',text)
guesser.save(SPAM_DB)
def train_ham(text):
guesser.train('ham',text)
guesser.save(SPAM_DB)
# try to guess the spam / ham ratio of a text
def guess(text):
spam = 0
ham = 0
value = guesser.guess(text)
for o in value:
if o[0] == 'ham': ham = o[1]
if o[0] == 'spam': spam = o[1]
return (ham,spam)Small, and really simple module no ? The next step, simply add a ’spam’ and ‘ham’ attributes on your comment post. And add two methods to train the comment as a spam or a ham.. And of course, only display comments which have a good ratio ( >1) ham/spam. This took me about 1 hour to implement…
View the Original article