Working on rspamd default configuration

zeroheure · March 24, 2022, 12:47pm

Hello
I try to understand the rspamd config in Yunohost and Freedombox. Both projects had left this part mostly untouched. Changes made looks inspired by internet tutorials without fully understanding (may be I’m wrong).

As a sysadmin, I allways begin with documentation. Comparing documentation and current Yunohost config, left me doubts and questions. So I would like to work on this, contributing at least basic doc for users.

(comme c’est technique, est-ce que c’est mieux d’en parler en français ?)

zeroheure · March 24, 2022, 1:39pm

First of all : why did you put additional config files into /etc/rspamd/local.d/ directory ?
local.d and override.d are directories for sysadmin changes not for distro changes (local is for small changes and override for whole config file). Now, if a sysadmin change values in these config files, they will be overwritten at package update.

zeroheure · March 25, 2022, 4:39pm

Why did Yunohost changes “reject” actions threshold (rspamd/local.d/metrics.conf)?

Comparing Debian and Rspamd debs package, I don’t find significative differences against selected modules and their config. Both packages share the same default value for reject threshold (15).
A higher value doesn’t make sense if modules selection and modules config is not changed.

BTW metrics.conf is obsolete (see note inside rspamd/metrics.conf), actions.conf is the right file to use.

zeroheure · March 28, 2022, 3:34pm

AFAIK the milter-headers.conf file define in yunohost repository is not installed — at least I didn’t find it in the scripts (in /hooks) nor on my server. Without this file, rspamd can’t inform user that an email is potentialy a spam.

Aleks · March 28, 2022, 5:05pm

Hey there !

Not sure if that helps a lot, but I think generally speaking, our rspamd configuration hasnt been really maintained by anybody in the last 4~5 years. Personally I do not have any real experience maintaining such a software and don’t really understand the algorithms or workflows used by such software. If you want to work on this, and assuming you sort of know what you’re doing, I would suggest you just trash our current config and start from scratch ? If I remember correctly, there’s also a new major version of rspamd in Bullseye so it probably make more sense to work on that new version instead of the old one.

ljf · March 29, 2022, 3:38pm

Feel free to join us during contributor meeting or on our dev chat on matrix.
We can discuss about this things, but as Aleks explained we have not a lot of knowledge on rspamd.

Maybe some topics on this forum could be helpfull to (don’t know i didn’t read it
Comment améliorer la détection des spams (configuration rspam)?
Questions sur rspamd
Rspamd, filter and mail

And on issue Tracker:
[rspamd + clamav] add antivirus in the core! · Issue #1459 · YunoHost/issues · GitHub
[dovecot] spam mailbox is not created · Issue #1456 · YunoHost/issues · GitHub

zeroheure · March 29, 2022, 4:42pm

@Aleks @ljf I have started the work in a dedicated branch. I will add explanations to each changes.
I keep Rspamd default as much as is, adding config options (in main files, to let user customize as he need) only for Postfix and Dovecot integration, allowing auto-learning through Spam folder. This config assume that Yunohost is used for personal or small teams only, with free RBL services. Commercial and hosting use would need paid RBL and autolearning on Spam folder disabled.

While I am not an expert, I owned a hosting company in the 90’s and I understand a bit how spam fighting work

Limezy · March 30, 2022, 4:13am

Great ! I’ll try to help if I can. I’ve learned a lot about spam last year…

zeroheure · April 3, 2022, 6:02pm

Hi Limezy

I’ve started to briefly document the base parameters, some of them are missing in the docs but can be easily found in Rspamd source. Purpose is to produce some maintainer documentation about Rspamd interaction with Postfix and Dovecot. Also, there params could be changed if Yunohost is used in more than a family server.
I’ve also nearly finished the base configuration of Rspamd-Postfix-Dovecot interaction.

Spam fighting is another subject. In my opinion, some things can be done quickly:

Rspamd defines around 600 parameters to calculate a spam probability. Unexperimented users are not supposed to play with them. Understanding the main ones and how the spam threshold score can be changed to catch with common situations can result in usable sample configs.
While Rspamd web interface is undocumented, its purpose is to help user to catch again spam. Currently it need some love to fully work in Yunohost. And of course it need to be documented.
Rspamd use of external RBL. AFAIK, none of them is free and user must pay for them as soon as it use a lot of resources. It need to be checked and documented.

I will push my changes on Github and let you know. We can work on my branch and discuss here to keep track of this work (similarily, it is better to discuss in english, even if we both speak french).

Limezy · April 4, 2022, 6:56am

Yes I’ll have a look and work with you on your branch, with pleasure !

Limezy · April 12, 2022, 11:26am

@zeroheure did you already publish your work on Github ? Didn’t see any mention or notification ?

zeroheure · April 12, 2022, 6:30pm

Not yet. I was ill (followings of Covid) and busy.

Limezy · April 13, 2022, 2:45am

I wish you good luck…! Let’s keep in touch when you’ll find time

zeroheure · April 28, 2022, 8:56pm

Sorry for the delay. I’m back.
There is two places I work on:

An rspamd etherpad wich is used to document Rspamd integration
Github branch enh-rspamd-integration. Quite empty for now because I’ve not yet reported all config tested and validated.

Limezy · April 29, 2022, 1:14am

Hi @zeroheure great, I’ll have a look !
I suggest you make a draft PR using your branch so that we can both work on it.
I don’t have yet a dev envo for Yunohost itself but I probably will have to setup one

Limezy · April 29, 2022, 1:55am

Just starting here by writing down a few things and notes

What could be our goals for Yunohost default config :

Absolutely needed

Rspamd 2.7.1
Spam detection when receiving
Detected spams sent to “Junk” folder
Learning from user actions (Spam → Inbox or Inbox → Spam)
Initial learning from existing email database

Nice to have

Rspamd 3.2
Web interface
Redis configuration to get faster
Spam detection when sending
Autoexpunge Junk folder
Fuzzy feeds from Rspamd (enabled by default but if I carefully read the policy, we should talk to Rspamd team before enabling them by default within an open source project such as Yunohost)

To be discussed

Mileage vary and all Yunohost users may not have the same use.
We should try to fine tune the Rspamd config to be as “universal” as we can.
The two main set of values to be defined are :

/etc/rspamd/actions.conf that defines the score thresholds for an email to get rejected, greylisted or accepted
- Type of questions to ask to ourselves : do we set a reject value so that no email get ever rejected ? Do we allow greylisting which is useful but sometimes confusing for users (why is that email from my friend still not arrived ?)…
Values defining the weight of each spam detection module into the final score (however, we may decide to keep this as the by default value)

Notes and tutorials

Workaround tutorial has been updated for Bullseye

My original tutorial for Yunohost

Limezy · April 29, 2022, 2:24am

I was able to run the webUI much easier than I thought
I think I’ll package Rspamd webUI as an app.
Or could we have it as part of the Yunohost web admin ?

zeroheure · April 29, 2022, 10:04am

[quote=“Limezy, post:16, topic:19283, full:true”]
Just starting here by writing down a few things and notes

What could be our goals for Yunohost default config :

Absolutely needed

Rspamd 2.7.1

Yes, one need to enable backports on Debian Buster

Spam detection when receiving
Detected spams sent to “Junk” folder
Learning from user actions (Spam → Inbox or Inbox → Spam)

Agree, this is the basis of a good spam filter. Exactly what I’ve done so far.

Initial learning from existing email database

It is true that it is necessary, but how to achieve it ? I’ve seen lot of users putting in Junk folder whatever email they don’t like. This mean that this email database should be external (it does exist).
NB: Rspamd need to learn trusted emails too, nearly the same quantity as spam,

Related points that comes to mind:

Some webmails comes also with bayesian spam detection, is it enabled in Yunohost?
Desktop email clients comes with spam detection too, where spam emails goes in Junk folder. Hence, shall we expunge Junk folder from server side? I don’t know
What could be Postfix, Rspamd or Milter errors messages, related to Junk folder, if user update its content from webmail or desktop or mobile client?

zeroheure · April 29, 2022, 10:10am

It’s already packaged but doesn’t work as is in Buster

Limezy · April 29, 2022, 11:41am

Wow great I’ll have a look

Sorry I didn’t understand

The learning can be setup as user based or server based. I think that if we setup it as user based we don’t really car whether it’s “real” Spam or just an email the user doesn’t want to receive. It’s the same action that we want in the end → go to Junk !
And yes, the learning is mainly about what the user wants, not what the user doesn’t want.
I think we should run it as a job that is run at every Yunohost update or something like that

Webmails will do their bayesian spam and then move to the Junk folder, but we don’t really care as we will run the Rspamd detection before it goes to the client anyway. I’m not sure to understand your question “is it enabled in Yunohost”

Same, here we don’t really care if the user voids his Junk box or not, or if his desktop client does it on a regular basis → we do it server side as a default, and everything else is a double fence (used or not). I don’t like client side based actions personnally.

I didn’t understand that question ? IMAP is designed to be a client - server mechanism, so if the server does some actions they will be mirrored by the client, and vice versa