Clean up `excerpt($page->text()->kt() etc)` from kirbytags and md headers


I need to use the excerpt() function to set a text preview of a blog post. But in my client’s use-case, they usually starts off the blog post by adding inline images and a header right after.

When the excerpt() function finds something like this (image: file.jpg caption: text), the outcome is that the text after caption: is printed, together with the header and the rest of the text.

Would making a pre-hook kirbytext filter be the way to go to strip out both the image caption and h3 tag?

(image: file.jpg caption: text)

### title

thanks, af

Yep, some regex vodoo in a pre filter could help.

If they always do it like that, why not use more fields for the images/subtitle?

Indeed I thought about point 2 as well, which would just be more easier for them and less maintenance problems in the future.

Though I was thinking that if I’d go with a kirbytext filter, probably using a post hook let me easily catch the first <figure> and any of <h1> etc in the html output.


OK, so having this at the moment:

kirbytext::$post[] = function($kirbytext, $text) {

  // 1. catch first-only instance of <figure> and any of <h1-h6>
  $match = '!(<figure(.*))|(<*.(h1|h2|h3|h4|h5|h6).*>)!is';

  $text = preg_replace_callback($match, '', $kirbytext, 1);

  return $text;

which is throwing an error because argument 2 is not valid (→ ''). I am not following the kirby column example of looping through every instance of the matched result, because I only need to match the first instance of the regex (so I put 1 a the end of the preg_replace_callback).

What am I missing? :smiley:

The second argument must be a callback, not just a string:

OK, I tried to convert it to preg_replace from preg_replace_callback but then the php server just break down (segmentation fault: 11).

As I want to simply match and delete those matches, it makes more sense to use preg_replace without callback right?

Current version with preg_replace instead of preg_replace_callback

kirbytext::$post[] = function($kirbytext, $text) {

  // 1. catch <figure> and any <h1-h6>
  $match = '!(<figure(.*))|(<*.(h1|h2|h3|h4|h5|h6).*>)!is';
  // 2. replace $match w/ an empty string
  $replace = '';

  // replace first instance-only of the match by adding 1
  $text = preg_replace($match, $replace, $kirbytext, 1);

  return $text;

this still gives segmentation fault: 11 and break the php server.

Have you tried something like this (in the template):



Come to think about it, a filter is not really that useful, because it will always be applied. You don’t want that. Its probably better to replace kirbytext first and then get an excerpt.

That will still echo the title, though, if without the tags.

How about this?


// an optional figure tag followed by optional line breaks and optional headers
$pattern = '!(<figure.*<\/figure>)?(\n*)?(<h[1-6].*<\/h[1-6]>)?!is';

// lets fetch some text from the starter kit with an image and h3
$kirbytext = page('projects/project-a')->text()->kirbytext();

$result = preg_replace($pattern, '', $kirbytext);
echo excerpt($result);

I was about to post my non-working code saying I moved everything in the template, and this works perfectly @texnixe!

I’m studying your regex, and realised only now there is no need to set a flag for how many times the regex should work when using preg_replace, right?

EDIT, it does not matter as I am using excerpt anyway, dumb.

Thank you!

Sorry, I misinterpreted his original question where he talked about “stripping out the tags” :wink:

Glad he got it figured out!

I’m not so sure this works reliably if there are more figure/header tags, think you have to test it some more.

I think the pattern can be improved like this:

$pattern = '!((<figure.*<\/figure>)?(\n*)?(<h[1-6].*<\/h[1-6]>)?)?!';

And then you can use the flag if you only want to get rid of one of the pattern or all.

$result = preg_replace($pattern, '', $kirbytext, 1);

But does it really matter to add the flag if then I am using the excerpt() function?

EDIT I see, it does a it might take up the caption of the next figure which is in the range of 50 words…