Thank you so much for sharing this @Josue! I just added this to my YunoHost configuration.
Nevertheless, I’d really like to go beyond the simple copy-pasting, and understand what this script does. I see that it performs a regex on known AI bots, but where does this go and how?
444 is no response from the server. Imho, the less information you give to attackers/abusers, the best it is.
I stupidly made a bash which wget the file and added it to my crontab, then used it in the hook with cat (in this example, file is /home/ai-robots_list :
[...]
cat << EOF >> $nginx_conf
# Some really bad bot with legacy user agent
if (\$http_user_agent ~* "(iPod|MSIE|Trident/|Presto/|PPC Mac OS X|Gecko/\\d{4}-|C(?:riOS|hrome)/(?:\\d{1,2}|1[0-1]\\d|12[0-4])\\.|F(?:irefox|xiOS)/(?:[0-9]{1,2}|1[1-2][0-9]|130)\\.|Version/(?:[4-9]|1[0-6]).*Safari/)") {
return 444;
}
# List from https://github.com/ai-robots-txt/ai.robots.txt/blob/main/nginx-block-ai-bots.conf
EOF
cat /home/ai-robots_list >> $nginx_conf
Oops, something is wrong… Now I am getting tons of NGINX error messages (log) in the diagnosis…
I deleted the file and successfully regenerated the conf, but everything is still unreacheable via Web, and I can only access the server via SSH… Any ideas, @tituspijean?
I panicked so I reset everything and got back to @Josue’s script without wget. It works well enough for me, and I don’t understand the details sufficiently enough to experiment with more developed settings.
thanks! I want to implement this… or this will be a default on yunohost?
If wanting to implement it, should I just create a /usr/share/yunohost/hooks/conf_regen/97-nginx_rebots-block
with that code?
Just came across this fork of nginx bad bots blocker specifically for Fediverse servers
Why this fork?
The default configuration for this blocker interferes with fedi software, such as Mastodon/GoToSocial/IceShrimp/Akkoma from federating correctly.
It also blocks a lot of Tor exit nodes as a result of them getting caught up in bad traffic.
This semi hard fork of the project exists to solve this, so it’s suitable for fedi admins and people who wish to have their services available to tor users. This is achieved by having a list of keywords for removal, along with retrieving the list of all Tor exit nodes from TorProject to remove matches.
Also in addition to the above purposes, I’ve made the deny.conf compatible for running Anubis or go-away behind this blocker.
And lastly, this is a semi hard fork which is able to stay working and updated, even when upstream is broken. I used to just merge and comment out matches. Now I generate the blocklist independantly using the lists provided from upstream, plus my own, and most importantly, retrieve the 10,000 top reported IP list from AbuseIPDB’s api directly. You still should use instructions from upstream for installation though.