Having different robots.txt info based on subdomain


#21

@lukasbestle I think ive figured it out. The staging server (which is a subdomain) forces www and so does the live server. If i disable the the www forcing, the robots files comes out properly.

So the question issss… how can I leave the .htaccess rule in to force the www AND get a weirdness free robots file.

to be clear…

This works:

http://sub.mydomain.com/robots.txt

This gets the < on the first line:

http://www.sub.mydomain.com/robots.txt

Which i dont think is strictly legal, but the site does work. As i understand it, www is a subdomain on its own, so i think technically the above is a subdomain of a subdomain.


#22

Why do you force www in the .htaccess not in the httpdconf file? Or redirect www to non-www in DNS settings?


#23

The answer there is it wasnt me :slight_smile: This is a project I have inherited, I didnt set it up or configure the server. It is a VPS though and i have access. Is that a better way?


#24

That’s always the better way. And in this case, you don’t run into strange side effects like forcing www on a subdomain.


#25

What’s interesting though is that forcing www has an effect on the output of your route (and only that route). That’s very strange.


#26

Sure is. If i get a minute i might try and replicate it on my own server which happens to be basically identical to my clients, but with a different webhost entirely (VPS, CentOs 7, PHP 7.2, Apache, WHM/cPanel).

Ill look into setting it via httpdconf, it’s not something i can just remove because the sites been live for a couple of years and ill i have to clean up Googles index if i do (I think).


#27

Have you tried what I wrote above (replacing the call to f::read() with a simple string)? The result could help with debugging.


#28

@lukasbestle I did, yes…I got the same result.


#29

Ah. Sorry, I missed that post. Becomes stranger and stranger to be honest. :frowning:


#30

Just to narrow it down. Do you use a cache or a minifier plugin on top? I know my html minifier plugin has caused similar issues in the past.


#31

:open_mouth::open_mouth::open_mouth: It was the @jenstornell minifier plugin! Switched it off in the config, and now issue is gone, even with forcing the www.


#32

So we forgot to do the most important thing: Check if it happens in a fresh Starterkit as well :roll_eyes:


#33

Yes lol… the amount of times I have told other people to try that. I have had no trouble from the plugin elsewhere though, so didn’t occur to me.


#34

So it was my fault after all… crap! :frowning:

I wonder what the best way is to prevent issues like this in my plugin. It does not know it’s a robots file right now. Somehow it needs to know that it should be skipped from minification.

Any ideas @texnixe @jimbobrjames @lukasbestle ?


#35

You seem to have two minified plugins, which of the two are we talking about here? The blacklist in the Kirby minifier seems to accept pages only. Maybe you can exclude URLs instead of pages? Or exclude response types like text/plain and only minify text/html?


#36

Well I dont really know to be honest, i’m not show the plugin works and im not much of a PHP coder. I robots file is after after all just a plain old text file. Perhaps a file white/black list?

@texnixe The plugin in play here is the Kirby Html Minifier. Looks like @jenstornell has updated it a little, im using 0.5, the current version is 0.8. I’ll see if updating it works.


#37

@jimbobrjames Your link or the name of the plugin seem to be incorrect. Please correct.


#38

https://github.com/jenstornell/kirby-html-minifier is the plugin.

This repo is the engine: https://github.com/jenstornell/tiny-html-minifier. It’s not bound to Kirby, but can be used with anything PHP based.


#39

Well… it gets worse. I just updated the plugin to version 0.8 and now the robots output looks like this… its stripped to one line. Version 0.5 just added the < at the first line, but didnt mess with formatting.

<User-agent: * Disallow: */modules Disallow: / Sitemap: https://www.domain.com/sitemap.xml 

#40

@jenstornell You already have a blacklist for response formats:

private function isBlacklisted($format, $blacklist = ['js', 'css'])
	{
		return in_array($format, $blacklist);
	}

only doesn’t seem to be documented? Should then be possible to exclude plain text responses…