How to create a cronjob to use sa-learn to teach spamassassin - Maildir


Enter Your Query:
Use '%' for wildcards and quotes for "exact phrases"


Top Level » Email » Spam

How to create a cronjob to use sa-learn to teach spamassassin - MaildirLast Modified: Jun 10, 2014, 12:19 am
This guide is to describe the steps to give you the ability to teach spamassassin what spam is.  The guide is for the Maildir format.  The mbox format can be found here.

This will assume that you've already installed spamassassin, and have it running in it's default state.
The domain will be called "domain.com" for system account "username", and email user "bob".
The folders will be called teach-isspam and teach-isnotspam, but you can use whatever you want.

1) Create a new imap folder that you can place your spam into, and a new folder that you can place false positives into (so it doesn't tag them again)

cd /home/username/imap/domain.com/bob/Maildir
mkdir .INBOX.teach-isspam .INBOX.teach-isnotspam
chown -R username:mail .INBOX.teach-*
chmod 770 .INBOX.teach-*
echo INBOX.teach-isspam >> subscriptions
echo INBOX.teach-isnotspam >> subscriptions

Note that this can also be done via any imap client (eg: squirrelmail), just create the "teach-spam" folder under INBOX.
You can also create the 2 folders in /home/username/Maildir if you wish, to let the system account train spam.

Now you have 2 new mailboxes under user bob when accessed via IMAP (squirrelmail or roundcube).  Test these out, try putting messages in them to ensure they function correctly.


2) Now that you have the ability to put your spam and non-spam messages into their correct places, you'll need to setup a cronjob to check these locations with sa-learn.
Create an sh file in /home/username/.spamassassin/teach.sh.
In it, put:

#!/bin/sh

DA_USER=username
DA_HOME=/home/${DA_USER}

#set this to 1 if you want the spam be removed after the run
DELETE_TEACH_DATA=0

learn_Maildir()
{
            FILESPAM=${1}/.INBOX.teach-isspam
            FILEHAM=${1}/.INBOX.teach-isnotspam

            if [ -e ${FILESPAM}/new ] || [ -e ${FILESPAM}/cur ]; then
                        echo "learning spam via ${FILESPAM}...";
                        sa-learn --no-sync --spam  ${FILESPAM}/{cur,new}
            fi

            if [ -e ${FILEHAM}/new ] || [ -e ${FILEHAM}/cur ]; then
                        echo "";
                        echo "learning ham via $FILEHAM...";
                        sa-learn --no-sync --ham ${FILEHAM}/{cur,new}
            fi

     if [ "$DELETE_TEACH_DATA" -eq 1 ]; then
         rm -f ${FILESPAM}/new/* ${FILESPAM}/cur/*
         rm -f ${FILEHAM}/new/* ${FILEHAM}/cur/*
     fi
}

if [ -e $DA_HOME/Maildir ]; then
     learn_Maildir $DA_HOME/Maildir
fi

for d in `ls $DA_HOME/imap`; do
{
            DOMAIN_DIR=${DA_HOME}/imap/${d}
            if [ -h $DOMAIN_DIR ]; then
                        continue;
            fi

            for maildir in `ls -d ${DOMAIN_DIR}/*/Maildir 2>/dev/null`; do
            {
                learn_Maildir ${maildir}
            };
            done;
};
done;

echo "";
echo "syncing...";
sa-learn --sync

echo "";
echo "current status:"
sa-learn --dump magic

exit 0;

Save, chmod the teach.sh to 700.
Don't forget to replace username with the name of the actual DA user (owner of the domain).

At this point, you should be able to manually run the teach.sh to see if it works.  If you test it out, run it as username so that you make sure all files written are chowned to username, and not root.


3) Now to automate the frequent running of the teach.sh so you don't have to run it manually every time.
Log into DirectAdmin as username and go to the cronjobs section.  Enter the commmand

/home/username/.spamassassin/teach.sh

and for the times, put minute=0, hour=*/12 (twice  day), a * character in all other filelds.

That's it.
To use it, if you get email that wasn't tagged as spam, drag it into your teach-isspam folder.
If you get email that was tagged as spam that should have been, move it to your teach-isnotspam folder.
You can delete the email you've place there aftera day or so, to ensure it's been caught by the sa-learn program.
Note that sa-learn can process the same email twice and it won't hurt anything.

If you want to empty these 2 folders after each run, set:

DELETE_TEACH_DATA=1

This will reset those folders to 0 bytes, so that you don't have to delete the messages after they're processed.
 
Related Helpfiles
How to enable SpamAssassin on your server.
How to limit the number of emails sent by each user (prevent spammer)
How to enable realtime blocklists (RBLs) with exim
How to create a cronjob to use sa-learn to teach spamassassin - mbox

2003 JBMC Software, Suite 173  3-11 Bellerose Drive, St Albert, AB  T8N 1P7  Canada.  Mon-Fri 9AM-5PM MST