Kirby 3.9 get/fetch lock timeouts

Hello there,

my client and I have the following problem:

Kirby is checking the lock status of a page every 10 seconds. This is fine. When I change something on the page and the page gets ā€œlockedā€, everything still is fine. Every 10 seconds the lock status of the page gets requested and is 200 OK.

This is where it gets tricky:
After a while, changes still pending, the requests for the lock are running into timeoutsā€¦


And after some time ā€œJson response could not be parsed: Gateway timeoutā€ appears.

Shortly after this, I canā€™t access the page in the kirby panel (after a refresh) for a while (max 1 minute) (due to the lock file still there). The site just keeps loading.

Also during these timeouts, Kirby opens a new connection on nginx for each failed lock request while this is supposed to be 1 connection (since Iā€™m the only one accessing it).

We use php 8 and Kirby 3.9. NGINX says 499 client closed request for the /lock requests. PHP-FPM gives me 500 internal server error after a minute (nginx 15:30 499, php-fpm 15:31 500)

Iā€™m certain this has nothing to do with our nginx and php configuration but with how Kirby handles these lock requests.

Any ideas? Any suggestions?
I can provide more screenshots and details if requested (since Iā€™m a new user, I can only add 1 in this post)

bs
Marc

Changed your user status.

@MarcJansen Could you please check your PHP error logs and maybe provide more information. Team guesses that this might be due to a server firewallā€¦

Is this a new installation, or did this only happen after you made any changes?

This is not a new installation. This Kirby 3.9 got upgraded from Kirby 3.5.3

I try to provide more information. I have to make some changes to PHP Log Level. Also Iā€™ll go check any firewall rules. But after some further testing this lock issue only occurs after working in the panel (make some changes, donā€™t save them, change tab without saving, come back etc.). While idling and not having any pending changes, Kirby can request lock hundred times without any issues.

Edit:
Just checked the rules. There is no firewall rule which could intervene here. We also use traefik as Ingress Controller and there is also no rule or circuit breaker which could intervene here.

Steps:
Made a change
Left it pending (not saving)
tabbed out
tabbed in
scrolled down

see console

After rechecking now, I got a rule in our traefik ingress which will kill requests taking longer than 45 seconds. The question now is, why does Kirby take so long for an answer? I canā€™t remove the ruleā€¦ 45 seconds are already a long timespan. It would flood all connections with Kirby lock requests.

I also traced the call:

#Kirby\Cms\ContentLocks::handle is taking too long. (nearly 2 minutes)

Detailed stack:


I added max script duration to PHP.

[02-Mar-2023 12:14:05] WARNING: [pool www] child 48, script ā€˜/var/www/html/index.phpā€™ (request: ā€œPOST /index.phpā€) execution timed out (36.174289 sec), terminating

[02-Mar-2023 12:14:05] WARNING: [pool www] child 48 exited on signal 15 (SIGTERM) after 340.027994 seconds from start

[02-Mar-2023 12:14:05] NOTICE: [pool www] child 60 started

Which also indicates, that the script/php takes too long.

Thanks for the additional information. It suggests that the call to flock() blocks because another thread/PHP worker tries to access the lock file at the same time. We havenā€™t seen this behavior of timed-out requests before, but I can try to help you debugging it.

What I find interesting is that it only happens once the user tabs out of the Panel. A few questions:

  • Are there multiple users in your Kirby installation?
  • When tabbing out, was the other tab unrelated to the Kirby installation?
  • Does it also occur in different browsers?
  • Does it also occur when using a fresh Starterkit installation on the same server setup?

Hello, sorry for my late reply:

I can answer the first 3 questions:

  • There are multiple users in our Kirby installation but the timeout occures with every user. We have users where multiple persons can login and we have users only for specific persons.

  • When tabbing out, the tab is unrelated to Kirby.

  • It does occur in firefox, safari and chrome, as far as I tested it.

Thank you for your reply. This suggests that the issue is caused by the site code and/or server setup.

Could you please first try setting up a fresh Starterkit on the same server setup to rule out issues with your site code?

Will do.

Today we encountered a new problem regarding lock requests and .lock files.

Once again the lock reqeuests ran into timeouts. After that, we couldnā€™t reach the element in the panel anymore. HTTP Error 502.

I then checked the content folder for this element: /content/example/8_example-element
and there was a .lock file present. After I deleted the .lock file, the element was reachable in the panel again.

Something is not right with locks.

Here is the php slowlog trace I generated:

[0x00007feb7c614fb0] flock() /var/www/html/kirby/src/Cms/ContentLocks.php:147 
[0x00007feb7c614d90] handle() /var/www/html/kirby/src/Cms/ContentLocks.php:101 
[0x00007feb7c614ae0] get() /var/www/html/kirby/src/Cms/ContentLock.php:40 
[0x00007feb7c614a30] __construct() /var/www/html/kirby/src/Cms/ModelWithContent.php:320 
[0x00007feb7c614910] lock() /var/www/html/kirby/src/Panel/Model.php:238 
[0x00007feb7c614870] lock() /var/www/html/kirby/src/Panel/Model.php:307 
[0x00007feb7c614670] props() /var/www/html/kirby/src/Panel/Page.php:313 
[0x00007feb7c614450] props() /var/www/html/kirby/src/Panel/Page.php:348 
[0x00007feb7c614370] view() /var/www/html/kirby/config/areas/site/views.php:9 
[0x00007feb7c6142e0] {closure}() /var/www/html/kirby/src/Panel/Panel.php:300 
[0x00007feb7c614270] call() /var/www/html/kirby/src/Panel/Panel.php:300 
[0x00007feb7c613f70] Kirby\Panel\{closure}() /var/www/html/kirby/src/Http/Router.php:118 
[0x00007feb7c613cf0] call() /var/www/html/kirby/src/Http/Router.php:144 
[0x00007feb7c613c30] execute() /var/www/html/kirby/src/Panel/Panel.php:313 
[0x00007feb7c613aa0] router() /var/www/html/kirby/config/routes.php:102 
[0x00007feb7c613a30] {closure}() /var/www/html/kirby/src/Http/Router.php:120 
[0x00007feb7c6139c0] call() /var/www/html/kirby/src/Http/Router.php:120 
[0x00007feb7c613740] call() /var/www/html/kirby/src/Cms/App.php:341 
[0x00007feb7c613640] call() /var/www/html/kirby/src/Cms/App.php:1243 
[0x00007feb7c6130b0] render() /var/www/html/index.php:5

Thanks, that all looks normal. From your description, the root cause seems to be the same as before.

Do you guys know anything about php file_put_contens() and flock() on NFS? We use NetApp and Kirby runs on a NFS.

After working trough some php documentation and after observing every error we got with Kirby which comes down to file_put_contents with LOCK_EX:

file_put_contents() / flock() is permanently waiting for other processes to remove locks (as you can see in the stack traces I posted)

Ah, that could indeed play a role. NFS generally supports advisory locking, the underlying technology of PHP flock(). However it depends on the used NFS version and probably also configuration. And even if it does work sometimes (like in your case), it may not be as reliable as with a local filesystem.

In general I wouldnā€™t recommend using Kirby over NFS as the increased latency can make Kirby sites much slower. Remember that Kirby is fully file-based, so every read and write needs to wait for the network and protocol stack when NFS is used.

I can now replicate the behavior - I canā€™t open the panel element anymore, are .lock files in the content path supposed to look like this? Looks like there is an unlock AND a lock written in that.

  unlock:
    - WcqkVYZL
  lock:
    user: Da1vXXcR
    time: 1683644063

Thatā€™s a valid lock file. It means that one user (the one now listed under unlock) locked the page for editing but wasnā€™t active for a while, so another user unlocked the page and is now the new holder of the lock.