Plugin to automatically submit pages to Internet Archive


#1

Just tossing this idea out there … as it’s been on my wish list for a while.

I’d like to have a Kirby plugin that automatically submits pages to the Internet Archive’s Wayback Machine every time a page is created or updated.

I assume that this would require the use of the panel.page.create or panel.page.update hooks, but coding something like this is a bit out of my current realm of expertise.

The URI format for saving page to the Internet Archive is pretty simple:

https://web.archive.org/save/https://forum.getkirby.com/

Can anyone provide some guidance around this idea? Any concerns?

Thanks!


#2

How often do your pages change? Im guessing not all that often. This feels like its better off as a monthly cron job via bash script.


#3

All you would have to do is to do a remote request from your hook, no plugin needed:

kirby()->hook(['panel.page.create', 'panel.page.update'], function($page) {
  $r = remote::request('https://web.archive.org/save/'.$page->url());
});

Don’t see why a cron job would be better suited for the purpose.


#4

Becuase you can batch it and submit the whole site in one go rather then pinging the wayback machine every time something small changes.


#5

Yes, but if that is exactly the purpose, then hooks are a better solution. Doesn’t make sense if you save after every comma, of course.

As always, there’s more than one solution, and the best fit depends on the use case.


#6

True, my pages don‘t change often, but when they do I would like the latest revision to be added to the archive.

Using a cron job to schedule a submission would work, but I’d need a means to walk through every single page in my site and submit them individually. The Internet Archive does not provide a direct way to submit and crawl an entire site.

Exactly. Saving after every minor edit doesn’t make sense.

The code snippet you provided gives me something to tinker with. Perhaps I just need to add a checkbox to the editing panel, so I can choose whether or not to submit a update to the archive when a change is being published.

I appreciate the feedback on this.


#7

It’s not all that hard to get all the urls… you can use the wget terminal command for this kind of thing.

It’s not directly obvious but from googling around earlier, i read that you can submit a text file with a list of links on it into that submission form and it will add them all - how true that is i dont know.


#8

As an alternative you could also put a button into each page to trigger saving to the machine manually, or a button into the dashboard to trigger saving of the whole page.

The hook alternative has the downside that it only reacts on content changes. If your content stays the same but your design changes, then a hook is of now use and a cron job or a manual trigger would be more useful.


#9

I used something similar to this in one of my projects.
If checked, when you hit save it pings the twitter api.
This way there’s no need to add an extra button.

22


#10

The downside with that checkbox is that you have to uncheck it the next time you want to save and then check again if you want to submit the next version. Otherwise, you would again submit every comma change.


#11

Or, you can uncheck it using the same hook after you’re done connecting to the api :wink:


#12

Clever :slight_smile: and some more chars…


#13

This is very clever. Thanks for all the ideas.