Best way for disallowing robots with robots.txt from everything?

My YunoHost server

Hardware: Raspberry Pi 3b+ at home
YunoHost version: 11.0.10.2 (stable)
I have access to my server : Through SSH | through the webadmin
Are you in a special context or did you perform some particular tweaking on your YunoHost instance ? : no

Description of my issue

Related to Google flags my sites as dangerous (Deceptive site ahead), since I also got false positive phishing warning from Google Safe Browsing, I wanted to add robots.txt to the root of my server (example_notrealsite.nohost.me/robots.txt), so that Google and other respectful robots will not index the site (and possibly remove the warning if possible…)

Only thing I got working was stop redirecting to SSO (from etc\ssowat\conf.json.persistent) when going to example_notrealsite.nohost.me/robots.txt, but I don’t know where put the robots.txt so it can be accessed from that address.

Can somebody explain how to do this? With detail if possible, because I’m still a noob with Debian and such.

P.S. I hope someday there’s option to do this just from the admin panel :wink:

robots.txt content:

User-agent: *
Disallow: /
1 Like

Add a new file with nano /etc/nginx/conf.d/example_notrealsite.nohost.me/robots.conf

It should contain:

location = /robots.txt {
   add_header Content-Type text/plain;
   return 200 "User-agent: *\nDisallow: /\n";
}

Save and close with CTRL+O then CTRL+X.

Note that you may end up with nginx errors if you try to install an app at the root of the domain at some point since the locations may clash. If so, move the whole location block into the location / one of the app:

location / {

blah blah...

    location = /robots.txt {
       add_header Content-Type text/plain;
       return 200 "User-agent: *\nDisallow: /\n";
    }

}

Before reloading NGINX, check that all is alright with sudo nginx -t.
If so, sudo systemctl reload nginx

2 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.