Hey,
I’ve written a little panel widget to scan through all pages and determine if links are broken (checking if a 404 is returned by the URL). And so far it seems to work, I’ve even added support for excluding pages and including other fields than ‘text’ just a few moments ago: https://github.com/wottpal/kirby-broken-links-widget
What I think is really missing to make this plugin bulletproof is making this work “async”. So maybe a bit like ImageKit by @fabianmichael with a start-button and a little progress bar. But I have no clue where to start… I tried to dig through ImageKits source which became quite overwhelming very soon so maybe anybody can sketch out a quick solution to my problem.
To exclude this link is a very good idea. Sonja, thank you very much for your hint.
But that is strange from my point of view.
The link to this page is not broken (I can reach that page directly), so for me it is wrong to show it in such a report!
Hey @anon77445132,
now in v0.3.0 you can exclude specific page-ids or absolute external links (if enabled). I even added /error as a default value. Please confirm if it’s working for you!
To make things async, ImageKit defers possible expensive tasks after the panel has loaded. I.e. Instead of scanning the whole thumbs folder to generate the generated/pending thumbnail counts when the widget HTML is generated, these statistics are lazyloaded via AJAX.
To make things async, you have to provide an API for you widget, which is accessible via JavaScript. Kirby’s router feature is your friend here, but you need to handle things like authentification and i18n yourself.
API Code with Authentification:
The actual crawler component that scans pages:
When ImageKit scans the whole site for thumbnails, it does do so by first fetching a sitemap via the API. The sitemap is iterated over to generate an HTTP request to every single page for triggering thumbnail job creation. The JavaScript API sends a custom header (X-ImageKit-Indexing: 1) to tell the server: Generate the page like you would normally do, but after that return me a JSON result instead of the whole HTML page.
ImageKit is somewhat smart here, as every indexing request also searches the page for rel="prev" and rel="next" links for making paginated pages crawlable as well (this happens on the server). If such links are found, they are added to the API response object and added to the scanning queue (JavaScript), if they’re not already in there.
Thanks a lot Fabian, I’ll definitely look into this! I’m also thinking about not putting to much effort into this plugin and maybe concentrate on a nice Kirby-3 version