Remember this - Caching in Kirby

Author: Bruno Meilick (bnomei)

About Me

My name is Bruno. I am the guy with the death-metal-facepaint avatar and a burning passion for making my Kirby projects TTFB run as fast as possible. I have created a few plugins since I joined the Kirby community back in Summer 2015, and a lot of them are in the performance category.

The problem

This post is not about how to write performant PHP code. Under most circumstances Kirby already performs very well, owing to the fact that it is a flat-file, no-db CMS. This post is about getting the best performance out of Kirby when you have too much content - think thousands of records. Kirby is lightweight, fast and flexible but somehow all that content has made it slow. What happened?

  • You need to load a thousand or more content files within a single request
  • Maybe even from very different folders, because of relations
  • Calling site()->index() now takes forever
  • You are generating the same output over and over again with each request
  • But you want all that done within as few milliseconds as possible?

Sure - no problem. Let’s take this step by step.

Please keep in mind that I am not an expert in managing servers, and all things stated here are my personal opinions. I will not provide proof for all statements made in this post, but encourage you to do your own research if in doubt.

Server

TIP: Use PHP 8 because it is faster.
TIP: Use a fast web server like Nginx. *
TIP: Pick a hosting service that suits the project’s needs.

Content Files

Since Kirby is a flat-file, no-db CMS it stores all its content in plain files, and is smart enough to only read them only when data from within the file is needed. Stuff like $page->num() or $page->template()->name() can be determined just from the file system itself, without needing to read the file. But as soon as any field value - like $page->title() - is accessed, the file will be read from disk. Sadly, that will be most of the time when you access or process your data.

When your code requests content from one of these files, the server OS flags them as hot: it tries to keep them in RAM for faster access on the next request, but will limit the number of files and their duration in RAM. This is called disk cache. So if you have a lot of files, not all of them will be in the disk cache.

Also, when running your PHP code all that content will be pulled into the RAM allocated to the PHP process. If you hit the limit assigned to that PHP process, the server usually throws an exception.

TIP: Server OS disk cache is not a guaranteed performance gain.
TIP: You must have enough RAM to fit your code and content.

PHP Code Files

Kirby is lightweight and loads very fast. The size of the PHP code with all dependencies is small in comparison to other CMSs. Activating caching for your PHP code will speed this up even more.

TIP: Activate opcache and apcu cache. *

The Root of all Evil

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.

Like this quote from Donald Knuth suggests it can pay off to optimize, but one needs to know what exactly and how. From my experience, writing optimized code can sometimes make a huge difference. But the most benefit is to be gained by doing less when one had to do a lot. Specifically, less as in less reading of files, because that takes up a lot of time. But how to do that? With caching.

Caching

Kirby does have various uses for caches, just like other CMSs do. But before we take a look at the different caches, we need to understand how a cache itself works.

Cache Drivers

A cache driver is a piece of code that defines where the get/set/remove-commands for the key/value-store of the cache are directed to. Kirby has built in support for File, Apcu, Memcached and Memory. With your site config files you can set a cache driver for each one of the caches in Kirby. You could set them all to use the same driver, or use a different one for each.

site/config/config.php, set the Pages cache to memcached *

<?php
return [
  'cache' => [
    'pages' => [
      'active' => true,
      'type'   => 'memcached',
      'host'   => 'localhost',
      'port'   => '11211',
      'prefix'  => 'pages',
      'ignore'  => function ($page) {
        return $page->id() === 'something';
      }
    ]
  ]
];

I have created additional cache drivers for PHP, Redis, SQLite and I am working on an MySQL-based one. Each driver has its pros and cons, which you can read about at my Boost plugins README, but to cut a long story short:

  • Use APCu if you have enough RAM and unless you go past 20k content files you should be fine with an entry tier server (~10EUR/month).
  • The PHP Cache driver is very fast but you need to have enough memory.
  • Otherwise use my Redis cache driver to connect to Redis on the same server because it has insanely fast preloading when using pipelining.
  • If you cannot use Redis, then use my SQLite cache driver because it’s still very fast using WAL and other pragmas.
  • Ignore the others. I know that it sounds a bit harsh, but the File cache driver will not help you much if you have a lot of data to load in a single request. Its performance will just keep getting worse and worse. The memcached driver is just not as fast, and the Memory cache driver is only a static PHP array, and of no use at all for caching data between requests.

Now that we know what drives a cache, let’s look at the different types of caches.

Pages Cache

Kirby has a Pages cache built-in that you can activate. It caches the HTML output rendered by your template. You can configure it to ignore certain pages by filtering. But if you change even a single file using the panel, the cache gets flushed. That solution works well to speed up smaller websites. My suggestion would be to keep the default File cache driver when using the Pages cache.
Personally, I’ve never used it, mostly because my websites have CSP headers with nonces that would not work then.

Configuration Cache

Kirby does not yet have a configuration cache. But I suspect that with more and more people using blocks, and lots of them on a single page object, the parsing of the multiple yaml files to the final php data must be eventually improved with some sort of caching. There are already some related issues known.

Plugin Cache

Each plugin can create their own cache. What cache driver you want for your plugin depends on what kind of data you are caching. If the data is only used in a single request - like the Pages cache - then use the default File cache driver. If your data is requested in context with another set of cached data, then make both of them use same cache driver - like, for example, when creating a list of excerpts: put all excerpts into the same cache. But wait… If I use the File cache driver, will it still create two files? Unless you wrap them in an array, yes, it will. You can continue doing manual wrapping of data in the same context, but you’ll end up with multiple files anyway, sooner or later. So in the end, it is easier and faster to use a cache driver like APCu, Redis or SQLite.

site/plugins/excerpts.php, create a cache

<?php
Kirby::plugin('bnomei/excerpts', [
  'options' => [
    'cache' => true
  ]
]);

site/config/config.php, set the cache to apcu

<?php
return [
	'bnomei.excerpts.cache' => [
        'type' => 'apcu',
    ],
];

site/models/post.php, a blog post has cached excerpt data

<?php
class PostPage extends \Kirby\Cms\Page {
	public function postExcerpt(): array
	{
		$cache = kirby()->cache('bnomei.excerpts');
		$key = $this->id();
		$excerptForThisPage = $cache->get($key, null);
		// exists and not expired
		if ($excerptForThisPage) {
			return $excerptForThisPage;
		}
		// else generate, set and return
		$value = [
			'title' => $this->title()->value(),
			'text' => $this->text()->excerpt()->value(),
			'url' => $this->url(),
		];
		$expire = 60*24; // in minutes
		$cache->set($key, $value, $expire);
		return $value;
	}
}

site/templates/blog.php, the blog reads cached excerpt data instead of the content files of the posts

<?php
// this will not cause the content files to be read
foreach($page->children()->filterBy('template', 'post')->listed() as $post):
?>
<details>
	<summary><?= $page->postExcerpt()['title'] ?></summary>
	<p><?= html($page->postExcerpt()['text']) ?></p>
	<div><a href="<?= html($page->postExcerpt()['url']) ?>">more</a></div>
</details>
<?php endforeach; ?>

It would be even more efficient for the blog model to cache all the excerpts into a single array. This would reduce the get requests to the cache to a single one. But there is a catch: what if we do not want to update the cache using the $expire variable, but on every change of a post instead? How should the blog know that a post has changed?

What I showed you above is an implementation of a partial cache.

Partial Cache

Quite a while ago I created a plugin called Lapse that makes it easier to use that type of cache. But what makes efficient partial caching difficult? Because you need to know when your cache became invalid - that is, when it expired. Remember the pages cache that got flushed when a single page changed? That works, but it is not an efficient solution. How about the excerpt data of our blog posts? How to keep their cache up-to-date then, unless you just expire them all after a fixed duration - like the $expire example above?

The Lapse plugin helps us with that. You call Lapse and forward all your Kirby objects - like page objects or file objects - relevant to the data you want to cache as its key. Lapse will then automatically use the modified timestamps of these objects to calculate a dynamic key, and set the value with an expiration of 0 - aka ‘forever’. If you change any object, the cache will be invalidated automatically, since the key is based on the modified timestamps and will thus be different. ‘Invalid’ as in ‘it has no value for that new key yet’, and the value needs to be generated.

Lapse will also help you to do that efficiently, since you define your data in a callback that will only be parsed if needed. The data you cache with Lapse is usually not HTML output, but stuff like all the excerpts - a PHP array of basic data types (string, integer, boolean, etc., but not objects). The only drawback with Lapse is that it cannot know when a cache is expired: it just keeps adding key-value-pairs, and the cache size will keep increasing. But that is nothing a monthly cron job triggering a flush cannot solve. It is still a better solution than having a fixed expiration time, or flushing everything on every change.

site/models/post.php,

<?php
class PostPage extends \Kirby\Cms\Page {
	// not needed anymore
	// public function postExcerpt(): array
}

site/templates/blog.php

// this will not cause the content files to be read
$collection = $page->children()->filterBy('template', 'post')->listed();
$data = Lapse::io(
    $collection, // will create a key from all modified timestamps
    function () use ($collection) {
        return array_values($collection->toArray(function($page) {
			return [
			    'title' => $page->title()->value(),
				'text' => $page->text()->excerpt()->value(),
				'url' => $page->url(),
			];
		}));
    }
);
<?php foreach($data as $postExcerpt): ?>
<details>
	<summary><?= $postExcerpt['title'] ?></summary>
	<p><?= html($postExcerpt['text']) ?></p>
	<div><a href="<?= $postExcerpt['url'] ?>">more</a></div>
</details>
<?php endforeach; ?>

Actually Lapse can do even smarter things with the key - read the Lapse plugins README if you want to know more about that.

Partial caches are great in reducing the number of files you have to load. But sometimes you just have to load files, and sometimes a lot of them. How can we make this faster, if we cannot avoid it? By reading the content from cache: that is faster than reading from files.

Content Cache

Recently I created a plugin called Boost that will cache the content of files automatically. In a nutshell, this is like dumping your content into a single cache, and reading that instead of reading the content files. It is lightning fast. Well, actually about 2x-4x faster on average than reading from files. The Boost plugin will also automatically keep the cache up-to-date when you modify, add or remove content.

Another great feature of Boost is that it provides your page objects with a unique ID - pretty much like my AutoID plugin does. But in Boost I was able to make the resolution of relations, based on that unique ID, faster than in AutoID. That is because the relations lookup table is cached as well, and once loaded it all happens in memory.

Luckily, using Boost is not complicated. You set the cache driver, add the fields and extend your model. Done.

site/config/config.php

<?php
return [
    // other options
    'bnomei.boost.cache' => [
        'type' => 'apcu',
    ],

site/blueprints/pages/default.yml

preset: page

fields:
  # visible field
  boostid:
    type: boostid

  # hidden field
  #boostid:
  #  extends: fields/boostid

  one_relation:
    extends: fields/boostidkv

  many_related:
    extends: fields/boostidkvs

site/models/default.php

class DefaultPage extends \Kirby\Cms\Page
{
    use \Bnomei\PageHasBoost;
}

// or

class DefaultPage extends \Bnomei\BoostPage
{

}

resolving relations

// one
$pageOrNull = $page->one_relation()->fromBoostID();

// many
$pagesCollectionOrNull = $page->many_related()->fromBoostIDs();

But even with Boost, once you reach 20.000 content files in a single request, on an entry-tier server you will have load times of 1500ms per request - and that cannot be considered performant anymore.

Beyond 20k

My Boost plugin will make loading the content faster, but with vast amounts of content the loading time will increase on a linear scale. So, what can we do now? How can we venture beyond 20k content files in a single request, without spending a lot of money on our server? I suggest you should build a headless website. You can code your frontend in Vue.js, Svelte, even with Inertia.js or whatever frontend framework you fancy most. What is common them all is that they send their JSON requests to Kirby’s router, and that’s where my ‘ace in the pocket’ and star of today’s show comes into play…

Headless with a dedicated HTTP Server

I have created a composer package that will help you use the Swoole Http server to setup a persistent Kirby instance. Think of it as another web server, that runs beside Nginx. But Swoole does not have to load the php code again with each request: it is a non-stop Kirby process, ready for action.

But what about changes to the content? The Boost plugin will take care of that. It does not have to load files again with each request. This only works for cache drivers like my Redis or SQLite that have an in-memory store.

some rough numbers

30.000 content files loaded in a single request
[1x]    3.000 ms   for regular kirby router site()->index()
[2x]    1.500 ms   for boosted kirby router site()->index()
[3x]  > 1.000 ms   for proxied swoole router with boosted in-memory store site()->index()

NOTE: The code of the package is not public yet since I am still figuring out how to solve some stuff related to the Kirby panel and memory leaks. Use the forum notification “bell”-button on the posts-navigation if you want to subscribe to updates to this thread.

I want proof!

Fair. Igor from Kirbyzone sponsored a server and helped me a great deal with the server setup so we can show you a demo today. Let me explain a few things about the demo before I give you the URL.

The initial render will be from PHP on the Nginx server, improved by the Boost plugin, displaying some stats in a yellow box - such as how many page objects were read, and how fast. The demo does that to give you insight on what is happening on the server. With a real headless setup you would not load all that content with that kind of request. It would be just your small, frontend code.

The demo has a menu on the left that lets you send predefined queries, and a textarea in the middle to tinker with the KQL query JSON. You can just start typing, and as soon as the JSON is valid it will send a request to the API. You can switch between different cache drivers and API endpoints to see their impact on response times. The /proxy endpoint is special, since it will skip Kirby’s router using Nginx as a proxy, and directly request data from the Swoole http server instance.

You can also use tools like Curl, HTTPie, Insomnia, Paw or Postman to send requests. The Demo website and Boost plugin README have further instructions regarding this.

cache drivers
https://kirby3-boost-php.bnomei.com/
https://kirby3-boost-apcu.bnomei.com/
https://kirby3-boost-redis.bnomei.com/
https://kirby3-boost-sqlite.bnomei.com/

myquery.json

{
    "query": "site",
    "select": {
        "humans": "kirby.collection('humans').count",
        "planets": "kirby.collection('planets').count",
        "stars": "kirby.collection('stars').count"
    }
}

HTTPie examples

# get benchmark comparing the cachedrivers
http POST https://kirby3-boost.bnomei.com/benchmark --json

# get boostmark for a specific cache driver
http POST https://kirby3-boost-apcu.bnomei.com/boostmark --json

# compare cache drivers: apcu and sqlite
http POST https://kirby3-boost-apcu.bnomei.com/api/query -a api@kirby3-boost.bnomei.com:kirby3boost < myquery.json
http POST https://kirby3-boost-sqlite.bnomei.com/api/query -a api@kirby3-boost.bnomei.com:kirby3boost < myquery.json

# compare endpoints: proxy, swoole and api
http POST https://kirby3-boost-redis.bnomei.com/proxy/query -a api@kirby3-boost.bnomei.com:kirby3boost < myquery.json
http POST https://kirby3-boost-redis.bnomei.com/swoole/query -a api@kirby3-boost.bnomei.com:kirby3boost < myquery.json
http POST https://kirby3-boost-redis.bnomei.com/api/query -a api@kirby3-boost.bnomei.com:kirby3boost < myquery.json

The End

That’s all folks! Hope you learned something new. Looking forward to your feedback.

__ Bruno

edited:
– 21/11/14 added php cache driver

21 Likes

Thanks for these in-depth insight, @bnomei, great job!

1 Like

Awesome post. Thankyou so much. I am at that point where my very first kirby site is stable, running well and am wanting to get the best out of the server/site caching possible for my client.

1 Like

How to make blueprints with query in multiselect option with fetch faster?

The problem with fetch

Lets imagine articles where we want to reference related articles by their autoid or boostid.

site/blueprints/pages/article.yml

# other fields ...
category:
  label: Related articles
  type: multiselect
  options: query
  query:
    fetch: site.find('articles').childrenAndDrafts
    text: "{{ page.title }}"    # boosted
    value: "{{ page.boostid }}" # boosted

In your frontend you do something like this

$recentArticles = page('articles')->filterBy('date', '>=', strtotime('-1 week'));
foreach($recentArticles as $article) {
   echo Html::a($article->url(), $article->title());
}

Using multiselect with queries to fetch will cause kirby to load these for every object you create with that blueprint. Not just in the panel but also in your own frontend even if you never need that data. But sadly that is nothing kirby can fix easily. So in the example above it will be okay-ish since kirby keeps loaded pages in memory.

But what if kirby does not cache the data for us?

site/blueprints/pages/article.yml

# other fields ...
category:
  label: Related twitter Posts from API
  type: multiselect
  options: query
  query:
    fetch: kirby.collection('twitterPosts') # needs a cache
    text: "{{ arrayItem.title }}"
    value: "{{ arrayItem.url }}"

We can even make this worse when we have a usecase where we wrap it in a structure. Then it will be loaded for every single row again and again.

site/blueprints/pages/article.yml

myFavPostsWithCustomRating:
  type: structure
  fields:
    rating:
      type: number
      min: 0
      max: 5
    twitter_url:
      label: Titter Posts from API
      type: multiselect
      options: query
      query:
        fetch: kirby.collection('twitterPosts') # needs a cache
        text: "{{ arrayItem.title }}"
        value: "{{ arrayItem.url }}"
$recentArticles = page('articles')->filterBy('date', '>=', strtotime('-1 week'));
foreach($recentArticles as $article) {
   foreach($article->myFavPostsWithCustomRating()->toStructure() as $post) {
      echo Html::a($post->twitter_url(), str_repeat('⭐', $post->rating()->toInt()));
   }
}

That might be a lot of calls to kirby.collection('twitterPosts').

Solution with a static cache

You could add the cache pretty much anywhere you like. A model, sitemethods, pagemethods… for the usecase above i would suggest using a collection from an file definition. While you can use the static keyword in functions there are some edgecases to consider and i prefer using a wrapper class.

site/collections/twitterPosts.php

<?php

class TwitterPosts
{
    static $cache = null;
    static function loadWithCache(): ?array
    {
        // if cached then return that
        if(static::$cache) return static::$cache;

        static::$cache = myLogicToLoadThePostsAndTransformThemToAnArray();
        return static::$cache;
    }
}

return function () {
    return TwitterPosts::loadWithCache();
};

But what if you have have a pages collection that takes a while to build and you do not want to do that again and again as well? Simplified anything with find, index, filter, sort, group might be a good place to add a cache.

 # needs a cache
fetch: site.index(true).filterBy('intendedTemplate', 'in', ['person', 'organisation', 'document', 'place']
fetch: kirby.collection('pagesThatCanBeRefrenced')

site/collections/pagesThatCanBeRefrenced.php

<?php

class PagesThatCanBeReferenced
{
    static $cache = null;
    static function load(): ?\Kirby\Cms\Pages
    {
        // if cached then return that
        if(static::$cache) return static::$cache;

        $collection = site()->index(true)->filterBy('intendedTemplate', 'in', [
            'person',
            'organisation',
            'document',
            'place'
        ]);

        static::$cache = $collection;
        return static::$cache;
    }
}

return function () {
    return PagesThatCanBeReferenced::load();
};

If you have a really big index (like 10k or mor pages) you might want to avoid calling index on every request. You can do that like this but its a bit advanced…

<?php

class PagesThatCanBeReferencedWithoutIndex
{
    static $cache = null;
    static function load(): ?\Kirby\Cms\Pages
    {
        // if cached then return that
        if(static::$cache) return static::$cache;

        // use lapse to cache the diruri
        // this will avoid index()
        $cachedDirUris = \Bnomei\Lapse::io(
            static::class, // a key for the cache
            function () {
                $collection = site()->index(true)->filterBy('intendedTemplate', 'in', [
                    'person',
                    'organisation',
                    'document',
                    'place'
                ]);
                return array_values($collection->map(function($page) {
                    return $page->diruri();
                }));
            },
            10 // expire in 10 minutes
        );
        
        // use bolt from autoid/boost to get pages quickly
        $pages = array_map(function($diruri) {
            return bolt($diruri);
        }, $cachedDirUris);
        // remove those that bolt did not find
        $pages = array_filter($pages, function($page) {
            return is_null($page) ? false : true;
        });

        $collectionFromDirUris = new \Kirby\Cms\Pages($pages);

        static::$cache = $collectionFromDirUris;
        return static::$cache;
    }
}

return function () {
    return PagesThatCanBeReferencedWithoutIndex::load();
};

Hope you learned something new. Happy caching!

5 Likes

Some news… the collection still needs caching but I created a PR where repeated calls to the same fetch statement will not cause the query to be resolved multiple times. A lot of repeated resolutions might happen because Kirby is resolving related blueprints. The PR greatly improves performance when many-to-many relationships are built in kirby.

@JonasHolfeld you might find this interesting.

1 Like

Wow, thanks a lot. I will test this with my current project with approx. 8000+ pages and a lot of many-to-many relations!

1 Like

using php 8.1 instead of php 7.4 or php 8.0 can improve performance even more.

1 Like

Hi bnomei, thanks for all your information!

I have a question about your snippet for a static cache for a collection of pages for multiselect (exactly what my panel is slowed down by tremendously, see Slow panel due to fields with big queries)

Should the snippet above work without your boost plugin or is that required? Because I tailored it to my case (without the intendedTemplate filter), activated the cache in the kirby config but it didn’t change anything?!

Thanks for your help.

my boost plugin is not needed but might improve performance non the less.

this is what i would do… use an cached array. to avoid calling the index on multiple requests by the panel (the static wont help there, just within a single request) i would use a cache. personally i suggest using my lapse plugin because its syntax is quiet simple to create caches. a similar technique is used in boost for its kv collections.

link:
  type: multiselect
  max: 1
  options: query
  required: true
  query:
    fetch: kirby.collection('cachedIndexKeyValues')
    value: {{ arrayItem.value }}
    text: {{ arrayItem.text }}

site/collections/cachedIndexKeyValues.php

// use a static cache (multiple calls in same request)
return fn() => lapseStatic(__FILE__, function () {
    // use a 1 minute cache to store props for the multiselect field
   // avoid crawling index for 1 minute
    $data = lapse(__FILE__, function () {
        return site()->index(true)->toArray(function ($page) {
            return [
                'text' => $page->title() . ' - ' . $page->uri(),
                'value' => $page->autoid(),
            ];
        });
    }, 1); // 1 minute

    // return a kirby Obj to make the panel multiselect arrayItem work
    return array_map(fn($item) => new \Kirby\Toolkit\Obj($item), $data);
});
1 Like

Wow, thank you! I just tried it locally and the panel navigation feels faster (of course), so the caching in general works.

The multiselect field itself still feels sluggish (1.000 entries currently). When I change that to a simple select field it’s really fast, but I lose the filtersearch :frowning: So that problem is probably not cache/server related, I guess…

Is there a way to trigger the renewal of the cache, let’s say in a hook (after creating or updateing a page)? That way the cached data should always be the latest when the dropdown is opened and I could reduce the timer.

Thanks again!

if you switch from __FILE__ to a string of your choice you can use \Bnomei\Lapse::rm('mycachekey') from anywhere you like. to track pages you would need to listen to a few hooks.
this would also allow you to set the expire to 0 (infinite) or at least a way longer duration than 1 minute.

edited: fixed ::remove to ::rm. thanks @stffr

\Bnomei\Lapse::remove('mycachekey') doesn’t work:

Error: Non-static method Bnomei\Lapse::remove() cannot be called statically

But \Bnomei\Lapse::rm('mycachekey') does (from the laps documentation). Is there a difference?

It seems to work - after a page is created/updated the cache file is deleted and rebuilt.

The only problem now is the sluggish multiselect.

Thanks very much!

EDIT:
Ok, I don’t know why I didn’t see this before, but there’s a search option in multiselect to reduce the results: Multiselect | Kirby CMS

This should do it.

1 Like