Strange slug translation of "ä"

Strange slug characters

I’m from Sweden and we have strange characters like… åäö. When adding a page title in the panel it translates…

läs

to…

laes

It’s probably translated that way because the sound of the character is not a and not e, but translate it to ae feels very wierd.

What I prefer

I would suggest it to work more like (sorry fokes) WordPress, which translate it to this…

las

A longer page title test

Page title

ät gärna läckra päron i gräset

Kirby slug

aet-gaerna-laeckra-paeron-i-graeset

WordPress slug

at-garna-lackra-paron-i-graset

For me, a swedish guy, I like the last one much more. It’s more readable, shorter and probably more SEO friendly.

No big deal?

I know I can change the slug if the suggestion does not fit my needs, but when creating a lot of pages it can be a good help if this works good out of the box.

Choose slug translation engine?

I know there are tons of things you can do and change in Kirby. Is there a way to chose slug translation engine?

I looked a little bit at the code and the replaement happens in the Str class of the toolkit: https://github.com/getkirby/toolkit/blob/master/lib/str.php

Through the following array:

static public $ascii = array(
    '/Ä/' => 'Ae',
    '/æ|ǽ|ä/' => 'ae',
    '/œ|ö/' => 'oe',
    '/À|Á|Â|Ã|Å|Ǻ|Ā|Ă|Ą|Ǎ|А/' => 'A',
    '/à|á|â|ã|å|ǻ|ā|ă|ą|ǎ|ª|а/' => 'a',
    '/Б/' => 'B',
    '/б/' => 'b',
    '/Ç|Ć|Ĉ|Ċ|Č|Ц/' => 'C',
    '/ç|ć|ĉ|ċ|č|ц/' => 'c',
    '/Ð|Ď|Đ/' => 'Dj',
    '/ð|ď|đ/' => 'dj',
    '/Д/' => 'D',
    '/д/' => 'd',
    '/È|É|Ê|Ë|Ē|Ĕ|Ė|Ę|Ě|Е|Ё|Э/' => 'E',
    '/è|é|ê|ë|ē|ĕ|ė|ę|ě|е|ё|э/' => 'e',
    '/Ф/' => 'F',
    '/ƒ|ф/' => 'f',
    '/Ĝ|Ğ|Ġ|Ģ|Г/' => 'G',
    '/ĝ|ğ|ġ|ģ|г/' => 'g',
    '/Ĥ|Ħ|Х/' => 'H',
    '/ĥ|ħ|х/' => 'h',
    '/Ì|Í|Î|Ï|Ĩ|Ī|Ĭ|Ǐ|Į|İ|И/' => 'I',
    '/ì|í|î|ï|ĩ|ī|ĭ|ǐ|į|ı|и/' => 'i',
    '/Ĵ|Й/' => 'J',
    '/ĵ|й/' => 'j',
    '/Ķ|К/' => 'K',
    '/ķ|к/' => 'k',
    '/Ĺ|Ļ|Ľ|Ŀ|Ł|Л/' => 'L',
    '/ĺ|ļ|ľ|ŀ|ł|л/' => 'l',
    '/М/' => 'M',
    '/м/' => 'm',
    '/Ñ|Ń|Ņ|Ň|Н/' => 'N',
    '/ñ|ń|ņ|ň|ʼn|н/' => 'n',
    '/Ö/' => 'Oe',
    '/ö/' => 'oe',
    '/Ò|Ó|Ô|Õ|Ō|Ŏ|Ǒ|Ő|Ơ|Ø|Ǿ|О/' => 'O',
    '/ò|ó|ô|õ|ō|ŏ|ǒ|ő|ơ|ø|ǿ|º|о/' => 'o',
    '/П/' => 'P',
    '/п/' => 'p',
    '/Ŕ|Ŗ|Ř|Р/' => 'R',
    '/ŕ|ŗ|ř|р/' => 'r',
    '/Ś|Ŝ|Ş|Ș|Š|С/' => 'S',
    '/ś|ŝ|ş|ș|š|ſ|с/' => 's',
    '/Ţ|Ț|Ť|Ŧ|Т/' => 'T',
    '/ţ|ț|ť|ŧ|т/' => 't',
    '/Ü/' => 'Ue',
    '/ü/' => 'ue',
    '/Ù|Ú|Û|Ũ|Ū|Ŭ|Ů|Ű|Ų|Ư|Ǔ|Ǖ|Ǘ|Ǚ|Ǜ|У/' => 'U',
    '/ù|ú|û|ũ|ū|ŭ|ů|ű|ų|ư|ǔ|ǖ|ǘ|ǚ|ǜ|у/' => 'u',
    '/В/' => 'V',
    '/в/' => 'v',
    '/Ý|Ÿ|Ŷ|Ы/' => 'Y',
    '/ý|ÿ|ŷ|ы/' => 'y',
    '/Ŵ/' => 'W',
    '/ŵ/' => 'w',
    '/Ź|Ż|Ž|З/' => 'Z',
    '/ź|ż|ž|з/' => 'z',
    '/Æ|Ǽ/' => 'AE',
    '/ß/'=> 'ss',
    '/IJ/' => 'IJ',
    '/ij/' => 'ij',
    '/Œ/' => 'OE',
    '/Ч/' => 'Ch',
    '/ч/' => 'ch',
    '/Ю/' => 'Ju',
    '/ю/' => 'ju',
    '/Я/' => 'Ja',
    '/я/' => 'ja',
    '/Ш/' => 'Sh',
    '/ш/' => 'sh',
    '/Щ/' => 'Shch',
    '/щ/' => 'shch',
    '/Ж/' => 'Zh',
    '/ж/' => 'zh',
  );

I haven’t tested anything, but since it is a public static variable maybe you could change it:

str::$ascii = array(); // your array

Not sure if this really works depending on the order and scope of this and when it would get called if you put it e.g. in the config.php.

Edit: okay, I tried it, but it seems not as simple as have it just redefined in the config.php. Maybe someone else has an idea where.

Edit2: no actually it does work, it just treats ‘Ä’ as ‘ä’ when replacing, cause all characters are lowered before replaced anyways.

1 Like

It DOES work, really! I just tried it. Amazing!

In my config.php I put this (just tot test):

str::$ascii = array(
    '/Ä/' => 'Ae',
    '/æ|ǽ|ä/' => 'JENS',
    '/œ|ö/' => 'oe',
    '/À|Á|Â|Ã|Å|Ǻ|Ā|Ă|Ą|Ǎ|А/' => 'A',
    '/à|á|â|ã|å|ǻ|ā|ă|ą|ǎ|ª|а/' => 'a',
    '/Б/' => 'B',
    '/б/' => 'b',
    '/Ç|Ć|Ĉ|Ċ|Č|Ц/' => 'C',
    '/ç|ć|ĉ|ċ|č|ц/' => 'c',
    '/Ð|Ď|Đ/' => 'Dj',
    '/ð|ď|đ/' => 'dj',
    '/Д/' => 'D',
    '/д/' => 'd',
    '/È|É|Ê|Ë|Ē|Ĕ|Ė|Ę|Ě|Е|Ё|Э/' => 'E',
    '/è|é|ê|ë|ē|ĕ|ė|ę|ě|е|ё|э/' => 'e',
    '/Ф/' => 'F',
    '/ƒ|ф/' => 'f',
    '/Ĝ|Ğ|Ġ|Ģ|Г/' => 'G',
    '/ĝ|ğ|ġ|ģ|г/' => 'g',
    '/Ĥ|Ħ|Х/' => 'H',
    '/ĥ|ħ|х/' => 'h',
    '/Ì|Í|Î|Ï|Ĩ|Ī|Ĭ|Ǐ|Į|İ|И/' => 'I',
    '/ì|í|î|ï|ĩ|ī|ĭ|ǐ|į|ı|и/' => 'i',
    '/Ĵ|Й/' => 'J',
    '/ĵ|й/' => 'j',
    '/Ķ|К/' => 'K',
    '/ķ|к/' => 'k',
    '/Ĺ|Ļ|Ľ|Ŀ|Ł|Л/' => 'L',
    '/ĺ|ļ|ľ|ŀ|ł|л/' => 'l',
    '/М/' => 'M',
    '/м/' => 'm',
    '/Ñ|Ń|Ņ|Ň|Н/' => 'N',
    '/ñ|ń|ņ|ň|ʼn|н/' => 'n',
    '/Ö/' => 'Oe',
    '/ö/' => 'oe',
    '/Ò|Ó|Ô|Õ|Ō|Ŏ|Ǒ|Ő|Ơ|Ø|Ǿ|О/' => 'O',
    '/ò|ó|ô|õ|ō|ŏ|ǒ|ő|ơ|ø|ǿ|º|о/' => 'o',
    '/П/' => 'P',
    '/п/' => 'p',
    '/Ŕ|Ŗ|Ř|Р/' => 'R',
    '/ŕ|ŗ|ř|р/' => 'r',
    '/Ś|Ŝ|Ş|Ș|Š|С/' => 'S',
    '/ś|ŝ|ş|ș|š|ſ|с/' => 's',
    '/Ţ|Ț|Ť|Ŧ|Т/' => 'T',
    '/ţ|ț|ť|ŧ|т/' => 't',
    '/Ü/' => 'Ue',
    '/ü/' => 'ue',
    '/Ù|Ú|Û|Ũ|Ū|Ŭ|Ů|Ű|Ų|Ư|Ǔ|Ǖ|Ǘ|Ǚ|Ǜ|У/' => 'U',
    '/ù|ú|û|ũ|ū|ŭ|ů|ű|ų|ư|ǔ|ǖ|ǘ|ǚ|ǜ|у/' => 'u',
    '/В/' => 'V',
    '/в/' => 'v',
    '/Ý|Ÿ|Ŷ|Ы/' => 'Y',
    '/ý|ÿ|ŷ|ы/' => 'y',
    '/Ŵ/' => 'W',
    '/ŵ/' => 'w',
    '/Ź|Ż|Ž|З/' => 'Z',
    '/ź|ż|ž|з/' => 'z',
    '/Æ|Ǽ/' => 'AE',
    '/ß/'=> 'ss',
    '/IJ/' => 'IJ',
    '/ij/' => 'ij',
    '/Œ/' => 'OE',
    '/Ч/' => 'Ch',
    '/ч/' => 'ch',
    '/Ю/' => 'Ju',
    '/ю/' => 'ju',
    '/Я/' => 'Ja',
    '/я/' => 'ja',
    '/Ш/' => 'Sh',
    '/ш/' => 'sh',
    '/Щ/' => 'Shch',
    '/щ/' => 'shch',
    '/Ж/' => 'Zh',
    '/ж/' => 'zh',
);

So since this works indeed, you might not wanna have the whole array/table in your config, but just the changes you want:

str::$ascii = a::merge(str::$ascii,
  array(
    '/Ä/' => 'A',
    '/æ|ǽ|ä/' => 'a',
    '/œ|ö/' => 'o',
    '/Ö/' => 'O',
    '/ö/' => 'o',
    '/Ü/' => 'U',
    '/ü/' => 'u',
  )
);

Notice for everyone: Haven’t tested if this any unwanted side effects on other parts of Kirby. So use carefully.

2 Likes

Thanks! I just figured out another way of not using the whole array…

str::$ascii['/Ä/'] = 'A';
str::$ascii['/æ|ǽ|ä/'] = 'a';
str::$ascii['/œ|ö/'] = 'o';

Your way is more beautiful, especially if you need to change more.

To all the swedish people

So swedish people (if any), here is all you need to have it the WordPress way, all cred to @distantnative

Put it in config.php.

str::$ascii = a::merge(str::$ascii,
	array(
		'/Ä/' => 'A',
		'/æ|ǽ|ä/' => 'a',
		'/œ|ö/' => 'o',
		'/Ö/' => 'O',
		'/ö/' => 'o',
	)
);

Works with ÅÄÖ and åäö.

I thought a bit too German when I created the translations for Ä and ä. In German they are basically A Umlaute, which translate to ä = ae, ö = oe, ü = ue. Maybe we should make those translatable somehow, but I’m not entirely sure yet, how. Practically the translation could be added to the panel translation files, but then again this would not be available for anything outside the panel. difficult!

Anything that speaks against changing the translation in the config as we described (besides it being not really an official option)? I mean, I think it’s rather a rare case. You could ofc think about making ‘Ä’ => ‘A’ the default rather.

Nitpick: Translation for ö is redundant with '/œ|ö/' => 'oe' and '/ö/' => 'oe' :wink:

For German, I think it would be odd to have “A” instead of “Ae”, I also had a look at Danish special characters and how they handle that in their URLs and that is similar to the German way, so I guess there is no “one-fits-all” and being able to have an option in the config sounds like a good idea.

I kind of like the not so official solution by @distantnative. The only thing is that it does not look like a native config.

Maybe somthing like this could make it more standard?

c::set('ascii', array(
    '/Ä/' => 'A',
    '/æ|ǽ|ä/' => 'a',
    '/œ|ö/' => 'o',
    '/Ö/' => 'O',
    '/ö/' => 'o',
));

And it will only change matched characters. The rest could still use the core character translation.

Maybe change ascii to something more telling.

I fully agree to @texnixe for German.
We need “Ae” and not “A” instead of “Ä” and so on for the six Umlauts and “ss” for the German “ß”.
For details you can look at the German Duden and its volumes. For the term Umlaut please look at Umlaut (in German language).

I fully agree to @bastianallgeier:
This may become part of the translation system of Kirby.
In German we need only seven chars to be translated correct.
But in Swiss version of German, there is no “ß”!

I totally agree for German, was just contemplating what the best default for Kirby would be. Since Ä is not only featured in German – personally I have no clue what the appropriate translation would be for other languages that feature Ä. Might be the same, might differ.

Good that this is coming up – sooner or later people will want to run Swedish and German umlauts at the same time (or as @texnixe mentioned: 2 variants of German): It probably should be possible to override the set per language.

Systems like Drupal handle these edge cases pretty well, but they also have a much longer history in tackling those.

Does the panel read the translation strings in site/languages which handle template translations?
Maybe placing a default YAML file or something similar per language override would be a possibility.

And ASCII conversions would be handy in template context as well (how does the file normalizer do it?).

In this code:

<?php
str::$ascii = a::merge(str::$ascii,
	array(
		'/Ä/' => 'A',
		'/æ|ǽ|ä/' => 'a',
		'/œ|ö/' => 'o',
		'/Ö/' => 'O',
		'/ö/' => 'o',
	)
);

Do you know why I need to have /ö/ in there? For me it looks like /œ|ö/ is translating œ or ö. Maybe | is not an or in this case?

If I remove the last item in the array the ö is not translated to o.

I agree with you and @distantnative pointed out the same above. Removing should actually work…
Does ä have the same issue? Because that is defined the same way as ö but without the second specific entry.

Edit: I know why: If you don’t overwrite it, the variant from the Toolkit is still there. So you basically need unset(str::$ascii['/ö/']) to remove it completely. I have created a PR to fix this directly in the Toolkit.

1 Like

As a complement to the above, there is now also a plugin to make it even simpler.

Oh Jens, why do you have to create a plugin for everything…? :smile:

That’s obvious, isn’t it? Jens wants to install at least 12 plugins in one go, so he has to produce all these plugins to have some choice … and it will make him feel more at home (think WP) :joy: :stuck_out_tongue_winking_eye:

1 Like

Learning and DRY i guess. :wink:

2 Likes

So you think WP is my home? :wink: I feel WP is like my old messy apartment. Kirby is my new clean one.

In WP I have around 40 plugins installed for larger sites. 20 of them is there to remove stuff I don’t need. 10 of them is for security (yes I need them all. WP is not safe). The other 10 is for other features.

In Kirby I use at most 15 plugins, but than many of them is core stuff so it’s a little bit misleading.

What I like about plugins

  • It’s kind of an isolated feature.
  • It’s often more well coded than if it’s done for a single project.
  • It can be used for more than one project.
  • It can be shared with others and hopefully get feedback or help to continue development.
  • I like the idea och plug and play. Just drag in the features you need and save time.
  • I think it should solve one problem and do it well.
2 Likes