Comment améliorer la détection des spams (configuration rspam)?

Limezy · February 24, 2021, 9:22am

Bonjour à tous,

J’ai enfin pris le temps d’organiser mes notes. Je vais vous présenter ici ce que j’ai réalisé comme actions et paramétrages sur mes instances, pour avoir un système anti-spam fonctionnel.
Attention, tout n’est pas ultra propre et c’est à vos risques et périls ! Dans l’idée j’aimerais bien que cela puisse servir de base à d’éventuelles améliorations de la configuration par défaut de Yunohost. Hélas, je n’ai ni les connaissances ni le temps pour m’en occuper directement !

Il faut déjà rendre à césar ce qui lui appartient, la très grosse majorité de ce ma configuration est inspirée de ce tutoriel qui est vraiment très bien fait. Le seul boulot a été de comprendre comment l’adapter aux quelques spécificités de Yunohost. Filtering out spam with rspamd workaround.org. Je reprends donc la structure de ce tutoriel et indiquerai les petites modifs à effectuer pour l’adapter au cas particulier Yunohost.

Je passe en anglais pour que ce tuto puisse servir au plus grand nombre.

1. Adding headers

Adding headers should be the first step when you try improving your spam filtering with Rspamd. Indeed, without these you are basically “blind” : Rspamd will consider some emails as spam and you will never know why, or some as “ham” (=not spam) and you will never know why.

Adding headers will make Rspamd write the score it gave to each email in the headers. Invisible for your everyday use but each time you have a doubt in the future or during the first weeks on which your are fine-tuning your setup it is definitely helpful to understand what’s going on under the hood → you will just have to read the headers or source code of the strangely processed email (how to do it depends on your email client).

Just create a new file in the correct location

sudo nano /etc/rspamd/override.d/milter_headers.conf

With the below content :

extended_spam_headers = true;

You can now restart Rspamd, send a test email to yourself and have a look at the headers

sudo yunohost service restart rspamd

You should see some lines close to the example below :

X-Spamd-Result: default: False [10.50 / 120.00];
MSBL_EBL(7.50)[odgnq78215@yahoo.co.jp,412457ff8e76de187013e526b71b1ce9d1d846f4];
HAS_REPLYTO(0.00)[odgnq78215@yahoo.co.jp];
R_SPF_ALLOW(0.00)[+ip4:82.57.200.0/24];
FREEMAIL_FROM(0.00)[alice.it];
REPLYTO_DN_EQ_FROM_DN(0.00);
TO_DN_NONE(0.00);
HAS_X_PRIO_THREE(0.00)[3];
FROM_EQ_ENVFROM(0.00);
RCVD_TLS_LAST(0.00);
R_DKIM_NA(0.00);
FREEMAIL_ENVFROM(0.00)[alice.it];
INTRODUCTION(2.00);
MID_RHS_MATCH_FROM(0.00);
ASN(0.00)[asn:20580, ipnet:82.57.200.0/21, country:IT];
ARC_NA(0.00);
FAKE_REPLY(1.00);
FROM_HAS_DN(0.00);
TO_MATCH_ENVRCPT_ALL(0.00);
MIME_GOOD(-0.10)[multipart/alternative,text/plain];
REPLYTO_DOM_NEQ_FROM_DOM(0.00);
FREEMAIL_REPLYTO(0.00)[yahoo.co.jp];
RCPT_COUNT_ONE(0.00)[1];
BAD_REP_POLICIES(0.10);
DMARC_NA(0.00)[alice.it];
RCVD_IN_DNSWL_NONE(0.00)[120.200.57.82.list.dnswl.org : 127.0.5.0];
RCVD_COUNT_TWO(0.00)[2]
X-Rspamd-Server: monserveur.fr
X-Spam: Yes

The most important one being the last one that says Rspamd has flagged this email as spam (we will use it later to place that email into the Junk folder automatically

The other lines give you details about the mark that were given to that email by Rspamd according to different criteria. Here you can see that this email got a mark of 10.5, mainly from the “MSBL_EBL” criteria.

Based on that mark Rspamd will take different actions, and this will be the next section.

2. Adjust score metrics

Once a mark was given to an incoming email, Rspamd can take different actions :

Let the email directly go to Inbox
Put the email in “greylist”, which will ask the sending server to wait and retry a little bit later. This was initially to avoid some low-level spam servers that didn’t have any queuing management. However this is becoming quite rare and I found that “greylist” a little bit useless nowadays
Flag the email, which will add that header X-Spam: Yes in the email to allow you to process it differently if you want (like making a filter to move it automatically to the Junk folder)
Reject the email - in that case you will never see the email

The detault values are not bad. I have just put a very high value on “Reject” because I want to be able to see all incoming emails, at least for the first months. I may lower that value when I gain confidence in my system.

To change the values, just create a dedicated file in the correct location

sudo nano /etc/rspamd/override.d/metrics.conf

Copy the following content and adjust the values to your taste

actions {
reject = 15;
add_header = 6;
greylist = 4;
}

Restart Rspamd for the new values to be taken into consideration.

sudo yunohost service restart rspamd

You can always do a configdump to double check (but beware, the config is very long and confusing)

rspamadm configdump

3. Send the spam automatically in the Junk folder

I recommend doing it with Rainloop rather than doing it in command line as it is way faster and more efficient. If for some reason you absolutely want to do it in CLI, you may refer to the original tutorial.

Using rainloop will create a filter at user level, not at server level.

Login to Rainloop
Go to settings > filters
Add a new filter and create a new condition : If header X-Spam contains yes
Add an action : move to : Junk folder
You can also check “mark as read”

Save your filter. From now on all your incoming emails that got a mark above “add_header” threshold (in our example above it’s 6) will automatically go to the Junk folder. Things start to take shape !

4. Learning with existing spam and ham

Amongst all the criterias that Rspamd uses to give a mark to an incoming email, one of them is a kind of neural network that “learns” from your actions. Sadly Yunohost is not configured to take advantage of that great feature (yet). The good news is that except if your server is very new you will still be able to leverage on all your server history. Indeed, you can train that neural network to detect spam against all your Junk folder, and also to detect “ham” against all your inbox.

Of course, first make sure that you have no false spams in your Junk folder, and no spam in your Inbox. It may be worth a few minutes checking. It’s important to train both Spam and Ham and not only one of them.

To train the spams :

rspamc learn_spam /var/mail/YNH_USER/.Junk/cur

(where YNH_USER is the user whose inbox you want to use as a training set)

To train the hams :

rspamc learn_ham /var/mail/YNH_USER/cur

(where YNH_USER is the user whose inbox you want to use as a training set)

Notes :

You may want to train spams and hams on several users if you have more than one Yunohost user on your instance. Rspamd configuration and training is shared accross the whole server so you can’t have different settings for different users (or that would be very complicated to setup and it’s not the purpose of this tutorial)
The first command will only work if the .Junk folder is indeed the folder where you have put all your Spams. Be careful because some email clients may not use that folder and may have created another folder like for example .SPAM. In that case you can either change the path of the command to /.SPAM/cur, either setup your email client to use .Junk and move all your spams from the other folder to .Junk folder. The latter option is the one I prefer because it’s always better to keep Yunohost as close as possible from default
The second command will train only against your inbox, not against emails that are stored in other folders. If you have all your emails in different folders, you may want to train against all them one by one using /YNH_USER/.Yourfolder/cur as a path

As usual, restart Rspamd to take these changes into account.
You can always have a look of how much your Bayes filter has been trained by using the following command :

rspamc stat

5. Turning on auto-learning

We will turn on auto-learning. Auto-learning is quite basic but still useful. What it will do is it will train as Spam if the email got rejected (very bad mark) and train as ham if the email got a negative mark (very good mark). Let’s create a new file :

sudo nano /etc/rspamd/override.d/classifiers.conf

And write the following line in it before saving :

autolearn = true;

You can also define boundaries on which the autolearn action shall be triggered (based as always on the mark given by Rspamd). Example where we ask Rspamd to train as spam if an email got a 5 mark, and as a ham if it got a -5 mark

autolearn = [-5, 5];

As usual, restart Rspamd for these changes to take action

6. Now the BIG thing : train spam / ham based on user action

This is definitely how we want things to work : I receive a Spam in my Inbox ? Then I mark it as a spam and hope my system will gain some “experience” from my action. I receive a ham in my spambox ? Then I place it back in my inbox and hope for the same. This is definitely possible with Yunohost + Rspamd, but there is a little work to do first.

The tutorial on which I base this Yunohost-flavored one is perfectly explaining how things work, so please refer to it if you want to understand what’s happening. Here I’ll limit myself to giving minimalistic explanations.

6.A Enabling and configuring imap_sieve plugin

For this we will need to edit the dovecot conf file

sudo nano /etc/dovecot/dovecot.conf

Find the protocol imap {} part of the file and add imap_sieve as per below :

protocol imap {
imap_client_workarounds =
mail_plugins = $mail_plugins imap_quota antispam imap_sieve
}

Then you’ll find multiple blocks plugin {}. Find the one with a few lines starting with “sieve”.
Copy paste the below lines to replace it (first 3 lines are unchanged) :

plugin {
sieve = /var/mail/sievescript/%n/.dovecot.sieve
sieve_dir = /var/mail/sievescript/%n/scripts/
sieve_before = /etc/dovecot/global_script/
sieve_plugins = sieve_imapsieve sieve_extprograms

# From elsewhere to Junk folder
imapsieve_mailbox1_name = Junk
imapsieve_mailbox1_causes = COPY
imapsieve_mailbox1_before = file:/etc/dovecot/sieve/learn-spam.sieve

# From Junk folder to elsewhere
imapsieve_mailbox2_name = *
imapsieve_mailbox2_from = Junk
imapsieve_mailbox2_causes = COPY
imapsieve_mailbox2_before = file:/etc/dovecot/sieve/learn-ham.sieve
sieve_pipe_bin_dir = /etc/dovecot/sieve
sieve_global_extensions = +vnd.dovecot.pipe
}

And then save that dovecot.conf file.

6.B Creating and compiling the sieve filters

Create a new sieve directory inside dovecot folder

sudo mkdir /etc/dovecot/sieve

Create a new learn-spam script inside that folder :

sudo nano /etc/dovecot/sieve/learn-spam.sieve

Copy the following code inside and save the file.

require [“vnd.dovecot.pipe”, “copy”, “imapsieve”];
pipe :copy “rspamd-learn-spam.sh”;

Do the same for the learn-ham script

sudo nano /etc/dovecot/sieve/learn-ham.sieve

Copy the following code inside and save the file.

require [“vnd.dovecot.pipe”, “copy”, “imapsieve”];
pipe :copy “rspamd-learn-ham.sh”;

Compile these two scripts with sievec :

sudo sievec /etc/dovecot/sieve/learn-spam.sieve
sudo sievec /etc/dovecot/sieve/learn-ham.sieve

Double check that last command did add two compiled scripts learn-ham.svbin and learn-spam.svbin inside the /etc/dovecot/sieve folder we just created

Fix the permissions for created files :

sudo chmod u=rw,go= /etc/dovecot/sieve/learn-{spam,ham}.sieve
sudo chown vmail.mail /etc/dovecot/sieve/learn-{spam,ham}.sieve

6.C Creating the bash scripts to be run by the above sieve filters

Create a new bash script file to learn spam

sudo nano /etc/dovecot/sieve/rspamd-learn-spam.sh

Copy the following code inside then save it.

#!/bin/sh
exec /usr/bin/rspamc learn_spam

Create a new bash script file to learn ham

sudo nano /etc/dovecot/sieve/rspamd-learn-ham.sh

Copy the following code inside then save it.

#!/bin/sh
exec /usr/bin/rspamc learn_ham

Let’s fix the permissions for the created files :

chmod u=rwx,go= /etc/dovecot/sieve/rspamd-learn-{spam,ham}.sh
chown vmail.mail /etc/dovecot/sieve/rspamd-learn-{spam,ham}.sh

Now we should be good to go to testing phase ! But obviously, we first need to restart Dovecot

sudo yunohost service restart dovecot

6.D Testing

Let’s check in real time what’s going on on the server :

sudo tail -f /var/log/mail.log

Then you can go to your email client, and try to move one email from your inbox to your Junk folder. You should see a line saying :

imap(ynh_user@example.org): sieve: pipe action: piped message to program `rspamd-learn-spam.sh’

You can then try to put back that same email (or another) from the spam folder to inbox (or any other folder) and you should see a line saying :

imap(ynh_user@example.org): sieve: pipe action: piped message to program `rspamd-learn-ham.sh’

Alternatively, you can also check with

rspamc stat

If the number of scanned email goes up each time you move an email from Inobx to Junk or vice-versa

Conclusion

I hope that this will help some of you deal better with your spams. I also hope that this configuration (or an improved version of it) could be added in future versions of Yunohost