Replace / avoid reading from flat-file to increase performance

i know there have been a couple of performance threads which i also participated in, so i have a quick question… my folder structure is:

- parent 
-- Child (about 14 children pages)
--- Grandchildren (about 10 Children) => which contain ranging 100 to 300 Images

(totals up to 42.000 files)

my local server is already extremely slow when i am putting all children and grandchildren…

i am using a small foreach to run though a session which contains like 3-4 file uri and get to their file object like this:

$site->file($var); 

it seems like for the amount of objects it’s running though all that siblings and whatsoever side informations that my loading times already go up to like 8 seconds.

what are the ways to cache it all but still be able to use all methods. E.g. globally replace the flat-file access for all those children & files because once i seem to use a file method, it’s already back being slow.

so e.g. if i just save all those images into an sqlite / text/ table, is there a way i could get to the file object via uri and have access to all methods like resize and whatsoever, without drawing all the performance for siblings

Edit:

another thing is when i am checking with Xdebug, and it’s running though a list of files, it also is always looking for a file template (filename.txt) which might also lead to some wasted performance - just how much? - so is it possible to set a flag not searching for a file template in a certain szenario?

42,00 files does not seem that many. I have a site with ~140,000 files totalling about 3gb, and Kirby copes just fine (a couple of seconds to scan the whole lot).

How are you actually searching for the files. site()->index() is the one to avoid.

I guess you could use the built in file cache to store the list upfront and use the file hooks to rebuild it if a file gets added or changed. I suspect though, there is something going on with the way you are querying stuff.

can you post the important bits of code in full?

i am aware of $site->index() and i am not using it anywhere

this project mostly consists of image, so 42.000 images it totals to approx 75 gb…

to some extend i am also using caching where it’s possible.
the parent, and children are cached, it seems as soon as i have larger file collections it’s starting to eat resources…

  • usually the files are getting resized with thumbnail so i guess it always checks if files exist and whatsoever.

  • in Xdebug it seems it’s always searching for file templates, but with the amount of files i have, obviously these do not have templates at all…

  • i’d also like it to avoid the original file to be copied over to media (i just noticed that too)

if i were to not upload these as pages and just use them seperately e.g. assets, would that make a difference??

the code in use is pretty basic…

i have commented most things and i am still getting a somewhat poor performance when using basic foreach queries running though those parent, children and when inside a children to run though the files…

In Kirby 2, when the thumbnails were not async i was having this case as well, i ended up creating the thumbnails seperately (which blocked rendering) and built / got the thumbnail / original urls manually without calling the page/file objects.

So in Kirby 3 i wanted to rely on async and (build-in)-methods to make my life a bit easier… but that’s still a dream lol…

thats concerning. Are these images not optimised for the web in the content folders? 75gb seems extremely high. i think if you optimised those, it would take the load off the server when it decides it needs to generate a thumb. I take you have maxed out the PHP ini file locally interms of ram and stuff?

One thing you can try is this… in a plugin…

<?php

Kirby::plugin('hashandsalt/kirby-skip', [
    'components' => [
        'file::url' => function (Kirby $kirby, $file) {
          $page = $file->parent();
          if (preg_match('/tiles\/|tiles$/', $page->uri()) > 0) {
            return $kirby->url() . '/content/' . $page->diruri() . '/' . $file->filename();
          } else {
            return $file->mediaurl();
          }
        }
    ]
]);

basically what the above does is, if the word tiles is somewhere in the url to the file, it will serve it directly from the content folder instead of processing it and moving it to the media folder, which skips it generating thumbs etc. However, this approach is only useful where the image in the content folder is already optimised and the correct size. If it’s an image you need to resize, scale or whatever via kirby then it wont fly, but maybe it’s useful to you.

the original image is already resized and with reduced quality (in terms of photoshop) but it still requires a somewhat high quality, since the original (uploaded) image is available as download and you wouldn’t want it to be as poor as possible…

on a live server the performance will be better obviously, but this will mean nothing when there are more users than just me…

Photoshop really not the greatest, even when you do a save for web. Im a big fan of this cli tool which can be run recursively from terminal. Give it a go on a copy of the images and see what a difference it makes. You can also try imageoptim.

My usual work flow is to save a jpeg from photoshop / affinity photo with 100% quality, then run that cli tool on them. It will reduce them without noticeable visual quality loss.

let’s say we don’t want to use pages and build watermarks just once…

I am experimenting with these kind of structure to keep it simple

and adding the most important image properties into an sqlite database (which will have about 38000 entries if “fully indexed”) to read them out when needed.

  • still experimental with improvements
  • will need to check how this will run once a few more watermark images are generated
    // Getting Images from same uri
    fileSQL($page->uri());
    $collection = new Collection(getImages('path = "assets/originals/'.$page->uri().'"'));
    $list = $collection->paginate(30);
    $pagination = $list->pagination();
    foreach($list as $image){
        createWatermark($image);
    }


    // function to get Images from SQL instead of FlatFile
    function getImages($where = null, $sort = null, $offset = 0, $limit = 1000){
        $query = Db::select('files','*',$where, $sort, $offset, $limit);
        $data = json_decode(json_encode($query->data), true);
        return $data;
    }


    // Create Watermarked images if not already happened and update urls to SQL
    function createWatermark($file){
        if($query = Db::first('files',['watermark','path','width','height','root','url'],'url = "'.$file['url'].'"')){
            $query = (array)$query;
            if($query['watermark'] == ''){
                $image = new \claviska\SimpleImage();
                try{
                    $image
                    ->fromFile($query['root'])
                    ->resize($query['width']/2,$query['height']/2)
                    ->overlay('assets/watermark.png','center')
                    ->toFile('assets/watermark/'.md5($query['url']).'.jpg',null,'85');
                    Db::update('files',['watermark' => 'assets/watermark/'.md5($query['url']).'.jpg'],['url' => $file['url']]);
                    return $file;
                } catch(Exception $e){
                    dump($e);
                }
            }
            return $file;
        }
    }



    // Initially add a file to the database index after scanning thought he folder with the same uri
    function fileSQL($path){
        if(!Db::count('files','path = "assets/originals/'.$path.'"')){
            foreach(Dir::read('assets/originals/'.$path) as $images){
                $file = new Asset('assets/originals/'.$path.'/'.$images);
                if($file->extension() == 'jpg'){
                    $collection = [
                        'url' => $file->url(),
                        'watermark' => '',
                        'file' => $file->filename(),
                        'width' => $file->width(),
                        'height' => $file->height(),
                        'modified' => $file->modified(),
                        'root' => $file->root(),
                        'type' => $file->type(),
                        'path' => $file->path(),
                        // 'id' => $file->id(), 
                        // 'mediaHash' => $file->mediaHash(),
                    ];
                    Db::insert('files',$collection);
                }
            } 
        }
    }

just a quick update for anyone who’s interested in these kinds of things…
I further optimized the above code to suit my needs and on my local test-site i am having approx 35000 image (+ same amount of watermarks) files within a hand full of pages.

instead of using the $file methods to count, index and run though them, i am adding them to a database on the first call and outputting the urls for the images from there. Performance wise it’s blazingly fast in comparision.

with adding pagination to lets say 300 or even 500 images per paginated page, the collection is calculated instantly. This adds a few challenges for checking if a file is existing and stuff, but nothing extrodinary difficult.

I hardly know any way to make the watermarking any more effective without exposing the files to the user. There’s something like:

https://brianium.github.io/watermarkjs/

where the watermarking happens client-sided, but obviously the image needs to be accessable. If anyone has another good suggestion for watermarking but not exposing source images, i am happy for any suggestion.

Otherwise watermarks could also be generated and uploaded beforehand.

ImageMagick can add watermarks. I think you can do it fairly easily by extending the built in thumb driver to add that functionality.

GD can also add watermarks…

i was trying with the thumb driver but couldn’t get that to work. and i have noticed a few … call it quirks … with what i have tested with the thumb driver.

e.g. sometimes it appeared that some thumbs were not watermarked and the source was outputted, might’ve been an error or my imagination though

the usual thumb driver also creates a copy in the media folder of all file versions, with having

  • 75 gb source
  • 7 gb watermarked
    (took like 6 hours for i7 Desktop)

i found myself quite overwelmed with the media folder, all the job and image files. Especially when using more than one thumb size, quality, modifier.
currently those files are seperate. I could lift the watermark part seperateratly on a desktop application to keep resources available. And this has to be done once only anyways.

did not read everything but disabling xdebug should improve performance.

yes.

having a pages tree of 40k like you described above should be no problem for kirby. the images are.

if you do not need the file meta data inside the kirby panel you could just store the images somewhere outside of content and fetch them lazily using the File class in you frontend logic (or when creating the watermark).

to have better control of the thumbs you could add plugin to rewrite the url (like when using a cdn) and create one for a custom thumbs controller (blocking copy of original).