.html pages showing up in analytics

My old website was static html pages. So for example the URL on my ‘about’ page was www.example.com/about.html. (I then used a .htaccess file to hide the .html suffix)

My new website uses Kirby. So the new URL for my ‘about’ page is www.example.com/about

But when I look at my visitor stats in Plausible Analytics (like Google Analytics) it is recording visits to both /about and /about.html

Which is weird because the old static site was replaced a few months ago.

I’ve done an experiment and typed in the following URLs into my browsers address bar:

www.example.com/about.php (get error page)
www.example.com/about.html (works as normal)

I’d be surprised if this amount of visitors have previously bookmarked my about.html page and contact.html page.

Why are a few of the old .html pages showing up in analytics and why do .html pages work (if anything I’d have thought it would have been the .php suffix that works)?

For some reason, your about.html actually returns a page, while the correct behaviour would be to redirect those links to about.

Don’t know what you mean when you talk about “hiding” those extensions in your .htaccess?

Hi and Happy New Year!

For some reason, your about.html actually returns a page, while the correct behaviour would be to redirect those links to about.

Yes, it does some odd.

Don’t know what you mean when you talk about “hiding” those extensions in your .htaccess?

The old static site had the following code in a .htaccess file:

RewriteEngine On

# add www and turn on https in same rule
RewriteCond %{HTTP_HOST} !^www\. [NC,OR]
RewriteCond %{HTTPS} !on
RewriteCond %{HTTP_HOST} ^(?:www\.)?(.+)$ [NC]
RewriteRule ^ https://www.%1%{REQUEST_URI} [R=301,L,NE]

## hide .html extension
# To externally redirect /file.html to /file
RewriteCond %{THE_REQUEST} \s/+(.+?)\.html[\s?] [NC]
RewriteRule ^ /%1 [R=301,NE,L]

My new .htaccess file is modified from the standard Kirby code – so that visitors are taken to the https version and ‘www’ is added to the URL. And that https is not turned on when I am working on the site locally.

But I don’t think this would account for .html pages being visited.

Here is the current code:

# Kirby .htaccess
# revision 2020-06-15

# rewrite rules
<IfModule mod_rewrite.c>

# enable awesome urls. i.e.:
# http://yourdomain.com/about-us/team
RewriteEngine on

RewriteCond %{HTTP_HOST} !^localhost(?::\d+)?$ [NC]
RewriteCond %{HTTP_HOST} !^www\. [NC,OR]
RewriteCond %{HTTPS} !on
RewriteCond %{HTTP_HOST} ^(?:www\.)?(.+)$ [NC]
RewriteRule ^ https://www.%1%{REQUEST_URI} [R=301,L,NE]

# make sure to set the RewriteBase correctly
# if you are running the site in a subfolder;
# otherwise links or the entire site will break.
#
# If your homepage is http://yourdomain.com/mysite,
# set the RewriteBase to:
#
# RewriteBase /mysite

# In some environments it's necessary to
# set the RewriteBase to:
#
# RewriteBase /

# block files and folders beginning with a dot, such as .git
# except for the .well-known folder, which is used for Let's Encrypt and security.txt
RewriteRule (^|/)\.(?!well-known\/) index.php [L]

# block all files in the content folder from being accessed directly
RewriteRule ^content/(.*) index.php [L]

# block all files in the site folder from being accessed directly
RewriteRule ^site/(.*) index.php [L]

# block direct access to Kirby and the Panel sources
RewriteRule ^kirby/(.*) index.php [L]

# make site links work
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*) index.php [L]

</IfModule>

# pass the Authorization header to PHP
SetEnvIf Authorization "(.*)" HTTP_AUTHORIZATION=$1

# compress text file responses
<IfModule mod_deflate.c>
AddOutputFilterByType DEFLATE text/plain
AddOutputFilterByType DEFLATE text/html
AddOutputFilterByType DEFLATE text/css
AddOutputFilterByType DEFLATE text/javascript
AddOutputFilterByType DEFLATE application/json
AddOutputFilterByType DEFLATE application/javascript
AddOutputFilterByType DEFLATE application/x-javascript
</IfModule>

Exactly, that doesn’t take care of .html suffixes, the page is still reachable. You can either do the redirect in your .htaccess or you add a route in your config.

For some reason, your about.html actually returns a page, while the correct behaviour would be to redirect those links to about.

Should Kirby automatically redirect pages with a .html suffix to pages to with no suffix? If so, why is this not working for me? And I guess the same question for .php suffix pages.

Or is this something I need to do in the .htaccess file? If so, do you know the code to do this?

No. Because for Kirby, this is probably a valid url (content representation) and if anything, it should go to the error page and not do automatic redirects.

Yes, or see above.

No, would have to google that myself. But if your old redirect worked, you could use that in the new .htaccess.

Thanks.

Any idea how people are visiting pages with a .html suffix (that no longer exists)? And even when pages with a .html suffix did exist, the suffix was removed/hidden from the URL by the .htaccess file.

I’d be surprised if it is returning visitors who have bookmarked about.html or contact.html, as about one third of the visitors to the ‘About’ page are on the .html version.

I don’t understand how they are ending up here.

Well, I can’t tell you that. Maybe there are still backlinks out there in the wild? Maybe check google search console

Well, I can’t tell you that. Maybe there are still backlinks out there in the wild?

Thanks, if that is all it is, I can live with that. I didn’t know if something very weird was happening with the code.