Location and contents of robots.txt

yves · December 4, 2015, 10:35am

Ahoi together,

maybe robots.txt is no more a thing, but where should I put this file (the web server’s root doesn’t seem to work) and is its contents according to

See – Step 9.4: Build the file robots.txt

still necessary?

1n3JgKl9pQ6cUMrW · December 4, 2015, 10:39am

http://www.google.com/robots.txt

Even they use it, so yeah… it can come in handy.

You can (dis-) allow spiders to index certain folders (can also be done, like Kirby does in the .htaccess).

And you can define a sitemap.xml address, etc…

I always use them, and sometimes a humans.txt

http://humanstxt.org/

texnixe · December 4, 2015, 10:46am

It goes into the folder where your Kirby installation lives, either /root or a subfolder, depending on your setup.

yves · December 4, 2015, 11:00am

Nope…

Is suspect these lines in .htaccess prevent anybody from reading the robots.txt

# block access to all unbrowsable files
RewriteRule ^.*\.(txt|md|mdown|yaml|yml|svn.*|git.*)$ - [NC,R=404]

(In another Kirby installation without these lines, robots.txt is visible)

texnixe · December 4, 2015, 11:12am

Yes, you cannot exclude access to text files and have robots.txt accessible at the same time. You would need to rewrite the rewrite rules to exclude robots.txt or any other files you want to be readable.

lukasbestle · December 4, 2015, 2:41pm

It is recommended to use the official Kirby .htaccess. It won’t block accessing the robots.txt.

yves · December 4, 2015, 5:45pm

Thanks for the link. But just to be clear: The two lines are taken from an official (though maybe beta) version of Kirby! Though with the latest .htaccess version the robots.txt works perfectly. So in sum – thanks a lot for the great support here.

lukasbestle · December 4, 2015, 6:41pm

Yes, that was from the Kirby 2.2 beta.

m-artin · September 15, 2016, 10:20am

Hi,
is it safe to allow everything in a robots.txt and the block rules in the .htaccess take care of the rest?
Speaking of a standard installation. (Of course you have to adjust your robots.txt, if you have other folders you don’t want to have crawled.)

User-agent: *
Allow: /

I’m asking, because maybe a small benefit would be, that there is no direct hint to a kirby installation this way.
P.s. If you want, you can even rename the panel folder and change your rewrite rules to the panel since kirby 2.2.0.
Securing the panel - good or bad practice? Post #9

lukasbestle · September 15, 2016, 10:35am

Yes, the other way around does not make sense. I have seen many sites where sensitive information is blocked in robots.txt but directly accessible without authentication. Putting those links in robots.txt makes it all even worse as attackers now have a handy list of sensitive URLs.

TL;DR: Pages that shouldn’t be accessible at all, whether by visitors nor by crawlers, should be blocked directly using .htaccess. robots.txt is only for information that should be visible to users but not to search engines.

anon77445132 · September 15, 2016, 3:44pm

@yves:

You can look at http://www.robotstxt.org/ for details.

Or search at google for “robots.txt”…

lukasbestle · September 15, 2016, 6:50pm

I think Yves has fixed his issue already since last year.

Topic		Replies	Views
Robots.txt and other files on root level Questions v2	3	2690	February 6, 2025
Having different robots.txt info based on subdomain Questions v2	42	1899	February 6, 2025
Secure your server: Files still accessable Questions v4	6	166	June 1, 2024
Security: Check your RewriteBase settings Announcements	14	4726	June 8, 2016
Problem with Xampp and localhost/kirby Questions v3	3	574	January 12, 2022

Location and contents of robots.txt

Related topics