Some slug transliterations / romanizations are not working

Hey all,

I’m working on a site that has some titles in non-latin characters. However, the transliteration feature doesn’t seem to be working consistently.

For example, hiragana in hiragana (ひらがな) returns nothing when you click “Create from title”

image

However, Russia in Russian Cyrillic (Россия), returns what I assume is a correct transliteration.

image

I’m trying to wrap my head around this. What part of the code actually executes this feature? I know for Japanese there are rules for each hiragana character in kirby/i18n/rules/ja.json , but these don’t seem being applied. Any ideas?


Extra Info

Kirby 3.6.1.1
PHP 7.4.19
Unmodified StarterKit

So this is not a multilanguage site, but single language (English) with just some titles in non-latin characters?

For the moment this is just a test environment. default StarterKit. So yes, single language.

However, the final website will be bilingual (English & French) with some titles in non-latin characters.

If I set 'slugs' => 'ja' in config, it works.

So in a multilanguage site you shouldn’t run into these issues. Maybe you can then add the slug rules to the English/French slugs to make the occasional non-latin slug work.

Oh, this feature is behind a config option? I didn’t expect that.

I’ll look into the the slugs option then.

In a single language site, yes.

In a multilanguage configuration, the slug rules should default automatically for each language and you can set individual slugs per language:

But your use case is an edge case, where you want to apply the non-latin slug rules to your latin character languages. So I would assume you would have to set them in those language definitions manually. But no idea why it works for Russian but not for Japanese.

1 Like

Ohhh, so if I was working on a website with Japanese as a language, the slug rules for Japanese would work for the Japanese pages, but not for the the pages in other languages.

Ok, so I dug into the the source code. I think the magic starts here: kirby/Str.php at 3.6.1.1 · getkirby/kirby · GitHub

the ascii() method transforms a string through both the language rules and the rules in the $ascii property, which contains cyrillic characters, but mostly accented latin characters. Which I guess is why Russian always gets transliterated to latin characters.

I understand why ascii() exists, and does what it does, but frankly it’s a bit strange that it has some cyrillic in the mix.

Ok, I figured it out for my use case. Essentially I’ve loaded the entirety of the non-latin slug rule into my slugs array inside my language file.

For example, here’s my site/languages/en.php file:

<?php

use Kirby\Cms\Language;

return [
  'code' => 'en',
  'default' => true,
  'direction' => 'ltr',
  'locale' => 'en_US',
  'name' => 'English',
  'slugs' => Language::loadRules('ja')
];

It works!

Thanks Sonja for the guidance.

Exactly, you must define the rules yourself or define a specific language so the rules for that language get loaded. This is done because different languages have different rules for the same character.