utf8 character problem with kirbytext

In the head of the file I have this:

<meta charset="utf-8" />

Text1 - On the page I have this:

echo $page->main_text()->kirbytext();

Text 2 - Down on the page I have this:

echo $p->text()->kirbytext();

It prints the character ä wrong, even if ä correct in the rest of the text:

Det innebär att vi är inne i den första beta-perioden och vi hoppas på att kunna sl��ppa den

However if I change the Text1 (main_text) to…

$page->main_text();

…without kirbytext() then Text2 above work again. Very strange becuase these two text have no connection.

In the config I have this:

c::set('languages', array(
	array(
		'name'      => 'Svenska',
		'code'      => 'sv',
		'locale'    => 'sv_SE.utf-8',
		'default'   => true,
		'url'       => '/',
		'direction' => 'ltr'
  ),
));

I also tried with no language setup but with the same results.

I use Kirby 2.06 with no plugins.

Update

In Text1 I had this text:

Main text

When changing it to…

Main text2

…Text2 works. How strange is that? Why?

I once has a similar problem and solved it by sending a content type header in my header.php right before anything else. No idea if that will solve your problem and if it is the best way, but it worked for me…

<?php header('Content-type: text/html; charset=utf-8'); ?>

I tried it now and the problem remains. Look at my update.

My tip (not tested to solve your problem, but I think so):

Save all files in “UTF-8 without BOM” format. Then you will have no problems, if your webserver is proper configured.

Hint: If you run Windows on your computer, use Notepad++ to edit these files.

I just double-checked and the files are UTF8 without BOM. I use WAMP as webserver.

Can you give us a link to the webpage?
Or post the whole HEAD of that webpage (from the beginning up to </head>) to us.
Can you post your template?

Are you sure, that your webserver is configured to serve UTF-8 correct? I use XAMPP on Win7 locally without any problems in German language (with the letters: ß, ö ,ü, ä, Ö, Ä and Ü) as my DEV system.

If you haven’t resolved this yet, you could add the utf-8 default charset either in the Apache httpd.conf file or in your .htaccess

AddDefaultCharset UTF-8

Other than that, check your HTTP requests and responses …

But since part of the text is displayed correctly, it might not even be a problem of character encoding. I wonder if there might be a strange hidden character somewhere that is causing this behaviour. Have you tried to delete the text and save it again?

I found out that I don’t use WAMP for this project. It runs through Live Preview, a function in Prepros through a port. When using WAMP it works like expected. When using it on a live server it also works correctly.

I have read that Prepros Live Preview is using utf8 so I still think it’s a mystery.

Live HTTP Headers chrome extension prints this:

Charset is utf8. Anything else to look for?

GET /lanera.se.prototype/ HTTP/1.1
Host: localhost:8006
Accept-Encoding: gzip, deflate, sdch
Accept-Language: sv-SE,sv;q=0.8,en-US;q=0.6,en;q=0.4,de;q=0.2
Cookie: PREF=ID=25be647517f583b8:U=3849577c81b8c921:FF=4:LD=sv:TM=1422876465:LM=1427211690:S=piLu54w1M4Zgyl38; OGPC=5061550-4:; OGP=-5061550:; SID=DQAAAA0BAAAnVMe9mFgj17T_E7yzeC6GgpVth_SfmSEYNZZinjFyFIvYmmqE4yBcpNNlDzedZtTfS37piEV5rW_8RW7USC3EOiBpNrpdJzq8aZ4DhpNc8HkVvPCERRklRG_wSm7F1Bn4fNtQYOkwtFXHvW-mODgLu3rythh1o0vR1-9an-1sPygzri-ycGLnMnTW8wqaQa26j-psLbYCGN4g3U54S9ZlArprBYgFUFzk3HfWR4YofSvhk2-HXaPoJdWnND89C810st3QHAKONge83jcDOAm-PwCDTvQbDSnaeOocXvZfhNE0OGizmxrc9y7bUi_Q90i8d3FcdXhJeV5WgHHB5LuBXyOl6-HH8KF5meIf5v32aA; HSID=AOb-_0P4FdHDR0fVG; SSID=AJrzP0mWUTlGoSE1_; APISID=YQ8UCcA2Pka4qIHk/AlFZkg6GpobfsRcBx; SAPISID=84jkLYrm3bnynuHK/ApTfKONV3xBI7ff2q; NID=67=ChQeqHLAZHzblAfL548A_CIL_HacwdO5UyX-4pRH--42GQYstkL3304L-PgM27bM-kQoPZAwiaVJYDui79VFmX0W_hsTc9GDqexMaJOPBbD7ge8wIos7dyCSZJQ091pyCC6hU7F42EJpk_BWVQbcoLT6sjrw-SeqQUCKrDIpxtpmJEQPQ28dVdCgBQZwHj5B0XRZ8aatJmCjYPvKCMgNiJc
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.65 Safari/537.36
X-Client-Data: CJa2yQEIpLbJAQiptskBCMG2yQEI7IjKAQieksoB

HTTP/1.1 200 OK
Cache-Control: must-revalidate
connection: close
content-type: text/html; charset=utf-8
date: Tue, 26 May 2015 06:08:28 GMT
Expires: 0
Pragma: no-cache
server: Apache/2.4.9 (Win64) PHP/5.5.12
transfer-encoding: chunked
x-powered-by: PHP/5.5.12

I also tried to add AddDefaultCharset UTF-8 in the htaccess, no difference.

@jenstornell:

I’ve been thinking a way I would address this issue in your place.
Since I do not know your installed system “Prepros3 Live Preview”, I have chosen a global approach.
I assume that you are working on a Windows PC (you named WAMP) as you computer.
Please really use Notepad++ (or Notepad++Portable), if it is given by me! It could be crucial to solve this issue.

Step 1:

Upload and save the following lines as file `testhtml5.html´ in the root of your server:


<!DOCTYPE html>
<html lang="en" dir="ltr">
  <head>
    <meta charset="utf-8">
    <!--[if lt IE 9]>
      <script src="//html5shim.googlecode.com/svn/trunk/html5.js"></script>
    <![endif]-->  
    <title>HTML5-Page in &quot;UTF-8 without BOM&quot;-Format</title>
  </head>
   
  <body>
    <h1>HTML5-Page in &quot;UTF-8 without BOM&quot;-Format</h1>
    <p>This line one is with &quot;HTML named entities&quot;: &auml; &ouml; &uuml; &szlig; &Auml; &Ouml; &Uuml; :end.</p>
    <p>This line two is with &quot;UTF-8 chars&quot;: ä ö ü ß Ä Ö Ü :end.</p>
    <br>
    <p>Good luck!</p>
  </body>
</html>

Go to step 2.

Step 2:

Download the file to your PC using FTP and then examine the download in Notepad++ (or Notepad++Portable). If your server runs on your PC open it from there directly using Notepad++ (or Notepad++Portable, http://portableapps.com/apps/development/notepadpp_portable).

Are you sure, it is saved there in UTF-8 without BOM-format? Look at the Notepad++ status bar on the right (the fifth section) for “UTF-8 w/o BOM”!

If yes, go to step 3.
If no, correct this using Notepad++ (or Notepad++Portable) and go to step 1.

Step 3:

Open this file in your browser by accessing something like “http://127.0.0.1/testhtml5.html”.
It may be that you need to adjust “127.0.0.1” or need to supplement the port.

Do you see something like:


This line one is with "HTML named entities": ä ö ü ß Ä Ö Ü :end.

This line two is with "UTF-8 chars": ä ö ü ß Ä Ö Ü :end.

If totally yes, go to the last step.
Otherwise go to step 4.

Step 4:

Are you using an up-to-date browser version?

If no, install a newer browser, e.g. FirefoxPortable ( http://portableapps.com/apps/internet/firefox_portable/localization ), in the newest version. After that go to step 3.
If yes, go to step 5.

Step 5:

Your server configuration has a problem, where I unfortunately can not help you.
I would look at the help- or FAQ-pages from your server!

This issue has finished, when my simple page is displayed completely correct like in step 3. Then go to the next step.
Otherwise go to step 5.

Last Step:

Please give us a detailed description, what/where was the problem (located in which step?) and what you have changed.
After that you can close this issue, if you want.

Thanks!

I followed all your steps. In all of them the content is totally correct (and UTF-8 without BOM). I guess it has something to do with Live Preview then.

As I wrote before, my problem only appear to a few characters and only under special circumstances. I also notices that the Live Preview was showing the correct text for a few seconds until I refreshed. Odd problem that feels kind of random and not very stable.

Now I changed the text a little bit and the problem has not come back.

Result based on what we know: Live Preview can somtimes create UTF8 character problems.

I have a similar real issue with Kirbytext formatting ‘title’ and ‘description’ fields. My txt files, generated via panel or via utf8 without BOM editor (Mousepad, Gedit), contain romanian special characters (ș, ț, ă, î, â - lower and uppercase).

My issue is JUST with î/Î and â/Â characters (&icirc, &Icirc, &acirc, &Acirc) which are rendered and displayed right by browsers, but html page sources contain html character code (&icirc) instead of î and same for uppercase and â too.

Textarea, which is markdown code, is displayed ok and page source too display latin character instead of html character code.

I left out semicolon (:wink: on html code. You get it.
My test machine uses php web-server (5.6.9) and my hosting provider webserver uses nginx and php 5.5.x. Same issue.

After 2-3 hours investigating, i came to conclusion that replacing on various site snippets and templates

<?php echo $site->description()->html() ?> with <?php echo $site->description() ?> <?php echo $site->keywords()->html() ?> with <?php echo $site->keywords() ?> <?php echo $page->title()->html() ? with <?php echo $page->title() ? and so on, solves my problem.

Hi, I have a similar but different problem. I write here, because it is also about UTF-8 encoding. If I should start a new thread let me know.

checking my code in the browser i see this

<h3 id='optische-qualit�tsanforderungen-f�r-acrylfenster'>Optische Qualitätsanforderungen für Acrylfenster</h3>

So, German Umlauts are not shown correctly, but only in the id tag of the h3-headline. The id tag seems to be shown by kirby itself (we havn’t changed that behaviour).

I checked: all files (content, kirby, site…) are utf-8 encoded. header(utf-8…) does not change behaviour. So it is probably best to find the line of code, where this line is produced and repair it. Can anybody point me to the line in the file responsible?

We are using kirby 2.4.1 - I hope we don’t have to update for this.

M.

Where is the ID generated? Could you post that piece of code please? Kirby does not auto-generate any IDs. Are you maybe using a plugin or a theme? I can’t find any IDs in H tags in Kirby’s Starterkit.

Thanks for pointing that out, texnixe.
The code is actually generated in a plugin called headid
I commented it out, as we do not use its functionality any more.

Yes, that plugin doesn’t handle special chars in general, only a given set of characters.

If you need it again, check out the TOC plugin, which might give you an idea how to handle this in a better way.