Migrating a large number of blog posts

I wonder if anyone has migrated large amounts of data from a blog to Kirby. Most of the legacy blog posts are in HTML, some are Markdown interspersed with Ruby (from Jekyll).

I assume just dumping HTML in a field works as long as I don’t want to edit the blog post (in that case I probably would need to edit the HTML directly, or recreate the blog post in a non-legacy format, which would both be acceptable, I think.

I wonder if anyone has done something like this, the blog posts I need to migrate go back to 2008!

So the post don’t come from a database?

With stuff dating back that far, it’s probably rather unlikely that this ever needs to be edited? So just storing the html as is is probably the best option. And then use a different template/blueprint for new posts so that you can take full advantage of Kirby’s editing options.

Or you would have to parse the html to store it either in markdown or JSON for the new blocks fields.

Yes, probably no editing ever. the posts migrated through multiple Wordpress versions, and then to Jekyll. I get pretty good HTML output from the latter, but the really old posts are still Wordpress-rendered HTML which is probably not suitable for parsing into JSON.

Markdown (through Pandoc) might work, but I don’t think the additional work is really worth the benefits.

Thanks for indicating that I’m on the right path :smiley:

Hi @yatil :wave:

When I migrated my blog from WP to Kirby two years ago, I wrote myself a custom-tailored import script (it looked like straight from Frankenstein’s laboratory; dealing with a decade of all kind of custom fields, shortcodes, extensions, comments, webmentions etc.etc.) where I rendered the Wordpress-HTML (exported as a WPXML file) to Markdown using GitHub - Elephant418/Markdownify: The HTML to Markdown converter for PHP as I created the Kirby pages.

This worked reasonably well, but required quite some adaptations to both the input and the output (mostly automated with regexes, but also some manual touches) to get it right. In hindsight, it probably wasn’t worth the effort, but I do have all my old content in clean(ish) Markdown, which I consider best for long-term archiving. And it’s of course cleaner markup in the rendered pages.

…that said, I still encounter errors in old posts occasionally, so perfect this wasn’t :wink:

1 Like

Thanks for sharing!

I’ve recently migrated a site with many custom created html pages, so no real reusable data pattern existed. I wrote me a script, which just extracted the actual main markup of each page and stored in in separate files within the kirby folder page structure which then are included in the templates on page request. this way you still can have an up-to-date page setup around that with your uptodate scripts, styles, head, foot and so on.

1 Like