How I created a Kirby site with 15'000+ pages

Since the large Kirby site question seems to keep popping up, here’s the stuff you can expect kirby 2 to handle and what not.

The website in question is about authors and their books.
Currently there are about 2’300 authors and 13’800 books published, in a “many to many relationship” (each author has many books, each book has many authors).
Each one of those is saved as a single page.

All books are subpages of a “books” page.
All authors are subpages of a “authors” page.

What Kirby does:

  • Editing the pages in the panel: opening and saving a specific page in the panel actually works surprisingly fine. It opens only the file it needs to and no more. However, I disabled the “status” option because I don’t want the panel to go and rename my folders.

What Kirby does not:

  • Route to a single page: opening a page on the frontend is surprisingly slow because of how the routing works. I initially thought that having a route that exactly matches the folder name would be fast. It isn’t. Since page URLs can be localized, and that localization is saved in the content files, the default router needs to read all pages at each given level. E.g: a route to /books/1234:

    • reads all first level pages to find “books”
    • reads all subpages of books to find “1234” (and if you have 13000 subpages this becomes rather slow).

    To overcome this issue I added my own dumber version of a router, that resolves to the text file on it’s own:

    $books = new Page(site(), 'books');
    $book = new Page($books, $id);
    

    Note that if you use a model for that page, you need to instantiate that instead:

    $books = new Page(site(), 'books');
    $model = page::$models["book"];
    $item = new $model($books, $id);
    

    (this code obviously leaves some stuff out for clarity)

  • the previous point means you can’t no longer localize those urls, except if you come up with your own logic to map those urls to your filesystem. It may also break other features that are implemented in the default router.

  • Panel ui: I’ve created panel widgets to find, open and create new pages. You wouldn’t want the default panel to handle so many pages. Therefore I’ve hidden the authors and books container pages (set hide: true on the parent blueprints).

  • Search: Can’t expect it to work.
    On the frontend you need your own search engine. I used a combination of mongodb and elasticsearch that I keep updated with hooks to index my pages.
    In the panel you need to replace the search route with your own logic or your editors will probably kill your server while trying to use the search box.

  • Many to many relationships: the filesystem structure implies only a one to many relationship (“parent-child” if you prefer). So you need to save some kind of foreign keys in your content files. On my pages those are saved as array of ids on both sides of the relationship. E.g: adding a book to an author means adding the author id to the book and the book id to the author. I keep those in sync with hooks.
    Expect trouble when deleting pages. Deleting a book means going through all its authors and removing the book id. However, the delete hook is run after the page has been deleted and you have no access to the page content. This means you have no straight forward way to get the list of authors from which it needs to be removed. I worked around this by searching trough mongodb to get the list of stuff I need to update, before I sync the changes to the index.

There are of course some other limitations, like you don’t want to use $site->index() or the likes… Listing and paginating stuff might be slow; you want to get lists from a database. If you have many thumbnails (~100) on a single page it might be slower than expected, something in the “check if a thumb is already created” stuff is rather slow. I’ve worked around this by memcaching the parts that contain thumbnails.

Conclusions

Would I do it again? Probably not.
The project started when I thought the panel would be flexible enough to handle directly content from a database. But it is not (or at least I didn’t manage to do it - the fields are way to much coupled to the filesystem, which was unexpected). Having the panel save to the filesystem was a not desired, but essential, workaround.

Maybe Kirby 3 panel handles this better.

11 Likes

Nice read! I suppose this kind of stuff tends to be more database driven from the ground up. Would by awesome that kirby would handle something like this… but on the other hand, I really love kirby the way it is: clean and simple to use.

Wow! I really like the effort and all the info that goes into this post. Many times I thought about connecting a mysql database and somehow try to get it work with the panel. My idea was to go with hooks to save a copy of everything to the database, but I could still not get rid of the page files so it did not make sense. Then I would end up with a database and files with content which would be no gain, not for the panel at least.

What I did instead was using a database completely detached from the panel. In the database I stored all the automatic pages that was populated from APIs. For the static pages that I wrote myself, I used the panel.

The result is this:

https://skicka-presenter.com/

The products are loaded from a database

It has about 35000 products and I think it’s fast enough. I have not yet implemented search on the live site, but it’s fast as well. The category page itself is a page in the panel where I have manually added content.

https://skicka-presenter.com/tag/present-till-brollop

That approach worked really well. Think about the need of the panel. If you don’t need the panel for the big load of pages, then use a database for those.

The database class is quite nice:

https://getkirby.com/docs/cookbook/database

Like you, I hope to work with database data in the panel at some point in the future.

Well, I didn’t expect Kirby to “just handle it”. My initial plan was to just use the “Form” classes from the panel to build views in the panel that are aesthetically indistinguishable from the other pages. So I could also use blueprints. I would then handle the form submission myself putting the data wherever I like.

Initial tests also showed that it worked with simpler fields. I could show database data in Kirby forms in the panel and then save them back. But while implementing the rest there were other fields that didn’t want to work without having a “folder location”, stuff like the structure field… not even talking about fields like image where you could expect it to not work.

Interesting post and nice to see this write-up as I was considering to use Kirby for a large content website (I’ve only used it for smaller sites to date).

It would be nice if there was an option to not localize urls, in order to avoid the issue you brought up. I personally think that for most cases I would want the url to be identical (in the primary language) across all languages - especially if it would considerably speed up performance of the site.

It would speed up things only in edgecases like mine where there are thousands of pages on the same level. In normal sites, say there are “dozens of pages” on a level, scanning trough them is actually quite fast.

With the 13k pages, on my vm it created a delay of about 4 seconds to open a page. Meaning that if there would have been “only” 500 pages the delay would have been something like ~150ms which could well be under the pain threshold of many.

I think the default behaviour is to not localize URL’s?

That’s true, but nevertheless Kirby checks each time if the key exists. So it has to read the file no matter if the key exists or not.