UUID loading / caching slow (e.g. with thousands of page/files)

, ,

Hi,

i am looking for ideas and ways to optimize the UUID generation when creating pages programmatically, which for me feels like a bottleneck.

My current scenario is, on my Dev Site, i am playing around with about 500+ Pages in multiple Child/Grandchild Folders, which include 50.000+ images.

  • Single Pages are called via page:// in e.g., certain Routes (works quickly when the uuid is populated
  • Images (e.g. Gallery) are being run through via URI for faster search, but in addition, the UUID is also being used at certain logics, the UUID is also saved within Pages (e.g. structure)

While the regular loading once the UUID is already populated seems to be very smooth, it really takes up to 15-20 seconds to get the initial UUID generated because it will run through all UUIDs to not collide. Thus, the more pages (and Files?) are available, the slower the UUID population will become?

When i observe the new createChild file, then:

  • the content file and it’s data is generated quickly, UUID empty at first, and then populated 15-20 seconds later. Until then also the $page->changeStatus(ā€˜listed’) will be delayed.
  • when pre-setting the UUID, the UUID-Field will already have a value, this value will not change later. But the $page->changeStatus(ā€˜listed’) will still be delayed
  • using Page::create() I can skip the draft / changeStatus and generate a Page with my pre-set UUID, the file is finished very fast and the runtime continues with other tasks as long as I don’t call the $page->uuid(), which would also lead to the bottleneck:
            $p = Page::create([
                'parent' => $parent,
                'isDraft' => false,
                'status' => 'listed',
                'num' => $data['date'],
                'slug' => $slug,
                'template' => 'some-template',
                'content' => $data, // Data also includes $data['uuid']; 
            ]);  
             
            // Example: if later i want to redirect to that page
            // loading will be delayed until the UUID has been populated.

I have already made different attempts to tune it up by not using page->uuid() immediately or pre-filling the uuid to try and force it not to run through the uuid index. But it’s still rather an unacceptable result when the population itself still takes 10+ seconds. And while pre-setting a UUID, it still seems to run a cross-check once the page is accessed for the first time.

I have also tried putting the UUIDs in a cache

  • (v5 RC-3) redis cache
  • sqlite cache
  • file cache
<?php return [
      'pattern' => 'cache-uuids',
      'action' => function () {
        foreach ($kirby->site()->index(true) as $page) {
            foreach ($page->files() as $file) {
                if($file->uuid()->isCached()) continue;
                $file->uuid()->populate();
            }
            if(!$page->uuid()->isCached()) $page->uuid()->populate();
        }
}
];

Inside the File / SQLite or Redis DB, I can see how all 50k+ entries appear to be cached.

But it still is rather slow. Is it even using the cache to check for the next UUID?
Disabling content.uuid is also not really what I want.

To split it up into multiple steps, I have also tried to save the data in a JSON file first, then run a cron job in the background to createChild based on all available JSON files. The JSON file was created very fast, but the cron job will have to process the population. This idea would work. But the application flow would be different and delayed.

Is there any way to skip running through the whole index and make Kirby just accept the pre-filled UUID as it is?
Are there other ways to decrease the time required for the UUID generation/crosscheck?

My K5+ performance plugin called ā€žTurboā€œ has a single-file uuid cache driver which will solve that issue. you can use just that uuid cache driver without all the Redis-based features the plugin includes as well.

you can copy the cache file from one installation to another.

On a side note, K5 includes several performance-related improvements to the UUID classes and the array helper. These enhancements particularly benefit scenarios like yours, where 20,000 or more pages are resolved in a single request. Based on my profiling, K5 is expected to be up to 10% faster than K4 when dealing with these tasks.

Thanks Bruno,

I have already been testing different things, like your boost/lapse plugin in earlier tests months ago on similar scenarios. While certain things got better in those scenarios, others got worse. Most of the time, it was safer to go without it. Lately, I also noticed you discontinued those two. Except for the case above, the usage of the UUID is usually good enough for me.

I certainly could give it a test run to check out the performance with turbo. But i’d be more happy to use the tools which are already at my disposal.

  • I could implement an Ajax loader that will UX a loading screen until the page is completely ready … like in between step.
  • or just change the flow to make a cron run behind the scenes, just like described.

Alternatively, I could do a page model for having those pages saved and edited via Database, just like found in the docs.

I still have the questions:

  • I am still confused if I cache all UUIDs in Redis this should be much more performant than reading thousands of UUID files or content pages. But it seems that when a new UUID gets created, it will not look it up in Redis / Sqlite by the amount of time it is loading?? After the page has been created the UUID is also found in the SQLite / Redis
  • When I am creating a new Page (e.g. createChild) will the File-UUID’s also be scanned or just the Pages UUIDs? Because I don’t know wether or not it’ll run though just 500 pages, or 50500 Pages+Files
  • Since i can pre-fill UUID which is not gonna change anymore, is there not a way to skip whatever is causing the delay? Ofc there’s a very slim chance that there could be a collision of identical UUID’s but with using the UUID-V4 it’s very unlikely I would say. If I were to use a database for those Pages I would also have to generate UUID’s and make them persistent.

Happy to hear how other tackle plenty of Pages in conjunction with Page Create and those UUID delays.

To give Turbo a quick shot, I have added it via composer on my DEV server with Kirby V5 RC-3.

First Error: I wonder if shell_exec will be available on most shared hosts, … i could simply enable it on my development.

Second error

<?php 
// Undefined array key "files"
// site/plugins/kirby-turbo/classes/Turbo.php Line 70
       return $this->data['files'];
?>

Basic Config:

    'cache.uuid' => [
        'active' => true,
        'type'     => 'turbo-uuid',
         //... Redis Host Infos ...
   ],

Page Models

<?php // Example
class ProjectPage extends \Kirby\Cms\Page
{
    use \Bnomei\ModelWithTurbo;
}
?>

With that random error and no idea what’s causing this, it’s rather difficult to draw a comparison in performance. Maybe no Sed command available, i am not sure… Would probably not be available on most shared hosts.

In comparison to turbo - where the UUID cache is supposed to be in one file instead of hundreds - that was actually my idea with sqlite/redis. instead of running though files, it could’ve just store and check all UUID in SQLite and append the newly generated one.

How do you generate the UUID when you create a page? Don’t quite understand why you are doing it in the first place, because when you create a page, a UUID is automatically generated.

Yes…

But the process seems like it’s first generating the content txt file with all values except for the uuid, (this is quick)

Then while I already see that file in the filesystem the script is still loading for another 10-15 seconds because it seems the uuid will be generated. And the process will continue after (e.g. change status, send email, redirect)

Then after the page is accessable and the uuid field is filled.

(That’s all automatic and just as intended)

But having let’s say 10000 pages and you are creating another page, the uuid may take 100 seconds to be processed, until eventually getting timeouts.

So my idea was to

  • basically pre-fill a uuid
  • cache all uuids for faster lookup on the uuid generation.

All in all to avoid loading times with continuously grow longer the more pages are being created.

Let’s say we have a few 100 pages which are one pager style pages with a page template of

E g. Projects or portfolio

  • text
  • gallery (200 photos)
  • contact
  • CTA

That’s
400 page uuid
20000 file uuid

Now that site feature a shop which generates a page with template order and you were getting 100s of orders. Then basically using createChild each order will become a little slower generating the uuid. Even if using a page structure like year/month.

Of course without uuid using uri the creation and lookup will be very fast, but disabling uuid altogether would sometimes be less convenient when having relations. E.g. the order has the product or portfolio uuid saved.

That could be anything really. Your users submit a request, which creates a page, many requests equal many uuids.

Your blog posts have a comments function where each comment is a page.

  • Can UUIDs be disabled for certain pages while keeping it active for others?

I can see this nowhere in the code, at least not in Kirby 4. The content including the UUID is created if not already passed, then the page is saved. Why should the UUID be generated AFTER the file is already saved? Have you been testing with multiple Kirby versions or just a particular one?

Right now I am running V5 RC-3, but becoming slower with more UUID’s is an overall observation also V4 / V3 versions. Since I have been toying alot with @bnomei Boost / Lapse Plugin to try and ā€œfixā€ it (both are not used here).

<?php // impersonate Kirby
            $order = $orders->createChild([
                'content' => $data,
                'slug' => $slug,
                'template' => 'shop-order',
            ],$kirby->defaultLanguage()->code());
           $order = $order->changeStatus('listed');
           echo json_encode([
            'code' => 200,
            'data' => snippet('form/error-messages',['errors' => ['Test: Creation & ChangeStatus finished.']], true),
        ],true);
          // Exiting right after
?>

The uuid is clearly empty. NEXT: If i then access the page e.g. via Panel oder continue for e.g.

$url = 'https://some-url/pay/'. $order->uuid()->id();
// it appears the UUID will be filled during this process and the actual redirection will delay for 10-15 seconds. The Panel loading for the page/fields is also delayed on the first load if i were to go there first. 
go($url); 
// Simplified example...

/* Actual process: I am using a fetch request to submit data to the controller, 
The controller checks and creates the page (order), 
and returns a redirection URL to the fetch response, 
A JavaScript redirection will then be initiated to that URL e.g. https://somepage/pay/(uuid); 
The payment page accesses the order via UUID to load totals and whatnot.

And on the first time load, it's slow as hell; afterwards, there's no speed issue anymore.
*/
?>

The UUID becomes available & filled once needed (that’s my assumption)

the uuid-turbo cache has nothing to do with redis and it does not need the models to have the turbo trait. also it should be using exec not shell_exec. the reasoning i built it that way is that if you have a website with 20k+ pages you will most likely not run it on a 10euro shared hosting anyway and have enough controll over the setup to get exec and sed running.

'cache' => ['uuid' => ['type' => 'turbo-uuid']],  // exec() + sed

sqlite and redis will still perform thousands of querys. you will gain almost nothing compared to kirbys default file driver. with the turbo cache its accessing an in-memory PHP array – it can not be done faster.

From my experience I can second what Sonja said about the UUID being added when creating a page. That process does not check if the UUID is available nor does it read all other UUIDs.

But since you need the parent page to add a child pages, you (or kirby internally) might end up loading many of those UUIDs and slowing things down.

Maybe the code adding a new page somehow triggers resolving more UUIDs that you had intended.

Based on the screenshot you might be creating a page content with an enforced empty UUID string. kirby might create and uuid and add that to the cache but somehow it get set to an empty string.

on the next request with that uuid it will not find it because the uuid in the content is empty. maybe (i have not checked the code) kirby then tries to scan/index all content files to find its the page for the now invalid uuid but finds nothing. maybe eventually its sure that the uuid is savly wrong an it will assign a new one.

but thats just me speculating on the behaviour you are experiencing.

I have been futher experimenting and let’s say there’s a few false statements above.

To observe the UUID within the panel, I had a UUID Field set up, so when the page was created, the UUID Field was set empty (not at the bottom of the content file), and in a later stage, then was filled.

Without that field, the UUID is generated with the content.

The Page creation itself seems to be quick, and the redirection to a static page is fast.

Now, if I were to use $page->uuid()->id() anywhere, the very first load of that page via page://uuid is still extremely slow. So I’ve tried to ajax load details from a route, setting up a static payment page, and the fetch call takes 10+ seconds to load.

While the UUID is there since the beginning, maybe the initial lookup to find that page (and cache the UUID destination) is what’s slowing things down.

<?php
return [
      'pattern' => 'payment-selection/(:any)',
      'action'  => function ($uuid) {
        if($order = site()->find('page://'.$uuid)){ // seems this is slowing it down now
            return [
                'code' => 200,
                'data' => snippet('ajax/payment-selection', ['data' => $order],true),
            ];
          }   
      },
    ];
    window.onload = function () {
        loadPaymentSelection();
    }
    async function loadPaymentSelection(){
        const response = await fetch("<?= site()->url() ?>/payment-selection/<?= $orderuuid ?>");
        const data = await response.json();
        if(data.code == 200){
            document.getElementById('paymentSelection').innerHTML = data.data;
        }
    }

From what I can tell, maybe the issue is that right after programmatically creating the page with its UUID, the page is not directly added to the UUID cache in /site/cache/yourdomain/uuid. This seems to happen only after

  1. you access the page e.g. in the Panel
  2. You call $uuid->populate()
  3. you generate the UUID cache via Uuids::generate()
1 Like

can you try using an info type field with info: {{ page.uuid }} instead of a regular field? Maybe the having an real uuid field is messing with Kirby somehow?

I agree, that’s my assumption that the cache is not yet happening. It’s not gonna write anything on file if my uuid cache is set to sqlite or redis (my assumption was that in redis the uuid cache will be drawn from memory and that should be faster than e.g., file cache.

How long is the expected time when populating/generating that UUID cache?

The way it’s working is it’s running through the files looking for the UUID, and then saving the UUID and the page location as cache to have a lookup reference. This would mean it’s running through 100s of pages.

When using createChild, I theoretically have the uuid destination inside my $page object, can not feed that into the uuid cache directly without searching for the page at a later stage anymore?

@bnomei
all sample data with the uuid field have been deleted,
the uuid field has been removed from the blueprint.

still caching takes way longer than i wish it would do.

I would have expected the entry to be created on page creation but if that’s not the case then I would consider it to be a performance critical issue that should be reported.

yeah that would’ve been my assumption as well. because we already know where the content file is located as well as the uuid-string.

can we manually append/push a uuid / destination pair without doing a new lookup?

As I wrote above, if you do:

 $p = Page::create([
	  'parent' => $page,
	  'isDraft' => false,
	  'status' => 'listed',
	  'slug' => 'some-page',
	  'template' => 'default',
	  'content' => [
              'title' => 'Some title',
              'text' => 'some text'
      ]
  ]);
$p->uuid()->populate();

the uuid is automatically added to the cache.

With that, there is no delay anymore.

$p->uuid()->populate();