Ahoi together,
maybe robots.txt is no more a thing, but where should I put this file (the web server’s root doesn’t seem to work) and is its contents according to
See – Step 9.4: Build the file robots.txt
still necessary?
Ahoi together,
maybe robots.txt is no more a thing, but where should I put this file (the web server’s root doesn’t seem to work) and is its contents according to
See – Step 9.4: Build the file robots.txt
still necessary?
http://www.google.com/robots.txt
Even they use it, so yeah… it can come in handy.
You can (dis-) allow spiders to index certain folders (can also be done, like Kirby does in the .htaccess).
And you can define a sitemap.xml address, etc…
I always use them, and sometimes a humans.txt
It goes into the folder where your Kirby installation lives, either /root or a subfolder, depending on your setup.
Nope…
Is suspect these lines in .htaccess prevent anybody from reading the robots.txt
# block access to all unbrowsable files
RewriteRule ^.*\.(txt|md|mdown|yaml|yml|svn.*|git.*)$ - [NC,R=404]
(In another Kirby installation without these lines, robots.txt is visible)
Yes, you cannot exclude access to text files and have robots.txt accessible at the same time. You would need to rewrite the rewrite rules to exclude robots.txt or any other files you want to be readable.
It is recommended to use the official Kirby .htaccess
. It won’t block accessing the robots.txt
.
Thanks for the link. But just to be clear: The two lines are taken from an official (though maybe beta) version of Kirby! Though with the latest .htaccess version the robots.txt works perfectly. So in sum – thanks a lot for the great support here.
Yes, that was from the Kirby 2.2 beta.
Hi,
is it safe to allow everything in a robots.txt and the block rules in the .htaccess take care of the rest?
Speaking of a standard installation. (Of course you have to adjust your robots.txt, if you have other folders you don’t want to have crawled.)
User-agent: *
Allow: /
I’m asking, because maybe a small benefit would be, that there is no direct hint to a kirby installation this way.
P.s. If you want, you can even rename the panel folder and change your rewrite rules to the panel since kirby 2.2.0.
Securing the panel - good or bad practice? Post #9
Yes, the other way around does not make sense. I have seen many sites where sensitive information is blocked in robots.txt
but directly accessible without authentication. Putting those links in robots.txt
makes it all even worse as attackers now have a handy list of sensitive URLs.
TL;DR: Pages that shouldn’t be accessible at all, whether by visitors nor by crawlers, should be blocked directly using .htaccess
. robots.txt
is only for information that should be visible to users but not to search engines.
I think Yves has fixed his issue already since last year.