[Help wanted] 🔗 Broken Links Checker Widget


#1

Hey,
I’ve written a little panel widget to scan through all pages and determine if links are broken (checking if a 404 is returned by the URL). And so far it seems to work, I’ve even added support for excluding pages and including other fields than ‘text’ just a few moments ago: https://github.com/wottpal/kirby-broken-links-widget

What I think is really missing to make this plugin bulletproof is making this work “async”. So maybe a bit like ImageKit by @fabianmichael with a start-button and a little progress bar. But I have no clue where to start… I tried to dig through ImageKits source which became quite overwhelming very soon so maybe anybody can sketch out a quick solution to my problem.

Thanks!
Dennis


#2

Thanks,

If I have something like (link: /error text: Error-page popup: yes) in my text, I get a broken link message. Something is wrong…

Kirby 2.5.7


#3

@HeinerEF Haha, good joke. The error page does of course return a 404 so that is not surprising. But who would want to link to the error page.

Maybe you should exclude it if you want to send your users to the error page without a broken link:


#4

For the editors of Sportanglerverein Barchfeld e.V., I have added some internal pages about e.g. the use of the panel. On an extended copy of Kirby CMS Clientmanual, I have added this link.

To exclude this link is a very good idea. Sonja, thank you very much for your hint.

But that is strange from my point of view.
The link to this page is not broken (I can reach that page directly), so for me it is wrong to show it in such a report!


#5

I think this is an edge case. Exclude your internal pages and you are all set.


#6

@wottpal:

At the moment it seems that I cannot exclude that link (/error or error), I can only exclude a whole page from scanning…:cry::cry::cry:


#7

Hey @HeinerEF,
now in v0.3.0 you can exclude specific page-ids or absolute external links (if enabled). I even added /error as a default value. Please confirm if it’s working for you! :slight_smile:

PS.: Still help needed!


#8

To make things async, ImageKit defers possible expensive tasks after the panel has loaded. I.e. Instead of scanning the whole thumbs folder to generate the generated/pending thumbnail counts when the widget HTML is generated, these statistics are lazyloaded via AJAX.

To make things async, you have to provide an API for you widget, which is accessible via JavaScript. Kirby’s router feature is your friend here, but you need to handle things like authentification and i18n yourself.

API Code with Authentification:

The actual crawler component that scans pages:

When ImageKit scans the whole site for thumbnails, it does do so by first fetching a sitemap via the API. The sitemap is iterated over to generate an HTTP request to every single page for triggering thumbnail job creation. The JavaScript API sends a custom header (X-ImageKit-Indexing: 1) to tell the server: Generate the page like you would normally do, but after that return me a JSON result instead of the whole HTML page.

ImageKit is somewhat smart here, as every indexing request also searches the page for rel="prev" and rel="next" links for making paginated pages crawlable as well (this happens on the server). If such links are found, they are added to the API response object and added to the scanning queue (JavaScript), if they’re not already in there.

I hope, I could help you a bit?


#9

Thanks a lot Fabian, I’ll definitely look into this! I’m also thinking about not putting to much effort into this plugin and maybe concentrate on a nice Kirby-3 version :slight_smile:

Also for reference, I opened an issue about that (https://github.com/wottpal/kirby-broken-links-widget/issues/1) and if anybody is up to implementing this a PR would be warmly welcome.