Multilingual search problem with capitals and accents


I am adding search functionality to a multilingual site (English | Greek). I notice that when searching with Greek words, capitalization and diacritics/accents return different results. For example, the same words with different capitalization are considered different. I made a test in German also, and the word ‘Glück’ for example, if entered without the umlaut is not matched. A similar thing is happening for the Greek language.

Can this be solved?

Not out of the box, but with a custom search component (either your own implementation or using a third party provider like Algolia).

Ok, this makes sense about the accents/diacriticals issue but shouldn’t capitalization be handled correctly by default? Is it a specific language issue? Testing English and German words in different capitalization configurations works correctly, but not in Greek.

Kirby relies on the PCRE2 regex engine of PHP to find matches, it passes the i flag to the regex to make the search case insensitive, but in PCRE2 world, i only works for the ranges [a-zA-Z] (that’s only the 52 symbols of the English alphabet, like: it also doesn’t match ü with Ü).

I guess Kirby could change that regular expression to also include the u (Unicode) flag, that would fix the upper / lower case issue. You could probably open a GitHub issue for this.

About the diacritics issue, that’s a bit more complex, as you’d have to transliterate every piece of content before running the search on it, that’s not very efficient. You normally do that step when indexing (so only when the content changes), but Kirby (as a flat file CMS) doesn’t have an index, so that would be better handled by a dedicated search engine.