Better organisation for large data sets

I have records of books. That’s about 1500 subpages of the books page. On the customer server, the queries run for 3-4 seconds until a result comes back from the server.

Is there a better organistation solution if I want to keep the possibility to search all 1500 records or does Kirby reach a limit here if the server does not provide the disk performance?

If you can categorize books, that’s a first great step to get more performance and a better organisation in the panel. You could organise them by authors for example. E.g.

/books
  /author-a
    /book-a
    /book-b
  /author-b
    /book-c
    /book-d

For larger data sets, it is really worth looking into an external search service. Something like a self-hosted elastic search instance or a service like Algolia. We use this for getkirby.com as well. You don’t just get better performance, you also get a much better search experience and powerful query options.

1 Like

I can categorise them by genre. Does this mean that searching through several smaller directories is faster than through one large directory?

If you want to search through all books, the reorganisation wouldn’t make the search queries faster. But you would gain massive performance benefits when navigating through the panel to edit books.

That’s why I suggested an additional search service. You would then create a complete search index for all your books with that search service and the search would be super fast, no matter how many books you store.

you could check out the plugins:

@bnomei

which might be useful for a large collection of pages, in the docs there are also a lot of good hints when dealing with large collections.

maybe you could try to cache the index of your books unless modified.

otherwise you could also try another caching driver such as apcu or sqlite.

1 Like

Thank you for your suggestions @bastianallgeier @carstengrimm. I will look into it.

Bastian in your basic content folder video you explain that blog entry directories could also be prefixed with e.g. 20241001_.

Is it easy to implement this so that the date is used as a prefix when saving an entry? In this case, I could already filter extensive data records using this directory prefix, e.g. only books from the year 2024. I could also dispense with manual sorting here and set it completely to date.

Yes, you can use the num option in the books’ blueprint to set the prefix based on a date field: Page blueprint | Kirby CMS

I agree with Bastian that using an external service to speed up the search is a very efficient way to increase search performance. Alternatively, as Carsten suggested, you could speed up the Kirby instance using one of my plugins.

  1. Adding a wrapping cache around your front-end search results will speed up recurring searches.

Taken from one of my projects this is how I cache search results…

site/templates/search.php

<?php

$query = get('q', '');
$results = null;

if (strlen($query) >= 3) {
    $language = kirby()->language()->code();
    $data = lapse('search' . $query . $language, function () use ($query) {
        $results = site()->index()->listed()->search($query, explode('|', 'title|text|blocks'))->limit(7);
        return array_values($results->toArray(function ($page) {
            $field = $page->text()->or($page->blocks());
            if (Str::startsWith($field->value(), '[{"')) {
                $field = $field->toBlocks();
            }
            return [
              'title' => $page->title()->value(),
              'url' => $page->url(),
              'excerpt' => Str::unhtml(markdown($field->excerpt(200))),
            ];
        }));
    }, 15); // in minutes

    $results = new \Kirby\Cms\Collection($data);
}

if ($results && $results->count()): ?>
  <ul>
    <?php foreach ($results as $result): ?>
      <li>
        <a href="<?= $result['url'] ?>"><?= $result['title'] ?></a>
        <p><?= $result['excerpt'] ?></p>
      </li>
    <?php endforeach ?>
  </ul>
<?php endif;
  1. The benefit of adding a plugin like Boost is that every query in your front-end and within the panel, not just searches, will be quicker (3x-4x) as the content is loaded from RAM (in the case of APCu) instead of files.

I thought it was a bit too simple. In the blueprint, the num field in the filter doesn’t seem to be available? Is that correct?

query: page.children.filter('num', 'num <', '20241015')

Is there a way that I can filter pages by their sort number?

filterBy and just < as operator
query: page.children.filterBy('num', '<', '20241015')

That’s great! Can ‘20241015’ be set dynamically?

sure. any php variable containing a string or integer will work.