Hello yunohost community.
I have an app proposal, with a bit of a radical idea behind it but for very practical purposes. I am wondering if there is interest in the community for such an app and I will reason the need I identify for it.
What the app does is block whole blocks of IP ranges belonging to Amazon at the firewall level. Why? To block AI scrapers throwing unbrearable load to your server.
I have been maintaining a server for a dozen dozens of users for a couple of years now. It’s not a bad server. Usually the load oscillate between 2 and 3 but recently, it often went over 20. After investigating why gitea was taking up so much memory and CPU, I noticed that there’s a giant flood of requests coming from amazon owned IP addresses.
A lot of these seemed to be bots of AI companies scraping our forge. I am aware that writing a proper robots.txt file can opt you out. But we still want our forge to be indexed by search engines. And I do not want to add exceptions into robots.txt for every new private service that is melting our planet. I think such damage should be opt-in and not opt-out. We maintain our infrastructure for people to use not machine to help capital profit.
So, the point of this app would be a very easy way to improve performance of smaller servers. One barrier is the fact that letsencrypt uses AWS EC2 instances, but that can be fixed via renewal hooks. The other problem is that docker which many people might be using is also hosted on AWS. But warning in the app description can let people know that functionality may be deteriorated after installing this app.
Afte AWS was blocked, there was a smaller flood of requests coming in, most were from Meta. After blocking facebook, a much smaller flood consisted of requests of micro$oft (I’m not kidding). After blocking some M$ blocks, there was just a tiny trickle of requests coming from various other parts of the internet. And this was all looking at our gitea access log.
The app could have options to also block microsoft, meta and any other malignant actors. That would involve maintenance work of making moderation lists, maybe categorizing network based on why and how they are malignant.
So, dear community, what do you think?
Kind regards, Jurij (from kompot)