Correct way to set up multi-language sitemap?


#1

Hey Everyone,

i have a kirby site up and running and i am using this solution to create a sitemap.

This works well for the default translation, but somehow i am getting a lot of crawler errors from google. I believe it is because google thinks, that the url’s i specified exist in english as well (and they do) but with a different link / uri.

The google example for a multi-language site is this:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
  xmlns:xhtml="http://www.w3.org/1999/xhtml">
  <url>
    <loc>http://www.example.com/english/</loc>
    <xhtml:link 
                 rel="alternate"
                 hreflang="de"
                 href="http://www.example.com/deutsch/"
                 />
    <xhtml:link 
                 rel="alternate"
                 hreflang="de-ch"
                 href="http://www.example.com/schweiz-deutsch/"
                 />
    <xhtml:link 
                 rel="alternate"
                 hreflang="en"
                 href="http://www.example.com/english/"
                 />
  </url>
  
  <url>
    <loc>http://www.example.com/deutsch/</loc>
    <xhtml:link 
                 rel="alternate"
                 hreflang="en"
                 href="http://www.example.com/english/"
                 />
     <xhtml:link 
                 rel="alternate"
                 hreflang="de-ch"
                 href="http://www.example.com/schweiz-deutsch/"
                 />
     <xhtml:link 
                 rel="alternate"
                 hreflang="de"
                 href="http://www.example.com/deutsch/"
                 />
  </url>
  
  <url>
    <loc>http://www.example.com/schweiz-deutsch/</loc>
     <xhtml:link 
                 rel="alternate"
                 hreflang="de"
                 href="http://www.example.com/deutsch/"
                 />
     <xhtml:link 
                 rel="alternate"
                 hreflang="en"
                 href="http://www.example.com/english/"
                 />
<xhtml:link 
                 rel="alternate"
                 hreflang="de-ch"
                 href="http://www.example.com/schweiz-deutsch/"
                 />
  </url>
  
</urlset>

But this is only for one page, the home page? How would i set this up correctly with kirby? Another problem i is that because i dont have the same site root url for every language ( site.com andsite.com/en ) the route does not reroute /sitemap to /sitemap.xml properly for the english version.

My Code at the Moment:

<?php

$ignore = array('sitemap', 'error');

// send the right header
header('Content-type: text/xml; charset="utf-8"');

// echo the doctype
echo '<?xml version="1.0" encoding="utf-8"?>';

?>

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
  xmlns:xhtml="http://www.w3.org/1999/xhtml">
  <?php foreach($site->languages() as $language): ?>
  <?php foreach($pages->index() as $p): ?>
  <?php if((in_array($p->uri(), $ignore)) || $p->IsInvisible()) continue ?>
  <url>
      <?php foreach($site->languages() as $language): ?>
    <loc><?php echo html($p->url()) ?></loc>
  <xhtml:link 
                 rel="alternate"
                 hreflang="<?php echo $language->code() ?>"
                 href="<?php echo $p->url($language->code()) ?>"
                 />
  <?php endforeach ?>
  </url>
  <?php endforeach ?>
  <?php endforeach ?>
</urlset>

Is it correct, that a multi-language site-map does not need priority and modified tags? And where does the sitemap have to be placed (for all languages). E.g. for .com/en/sitemap.xml and .com/sitemap.xml or is it ok to just have one sitemap at .com/sitemap.xml ?

Thanks for your help :smiley:

@lukasbestle i did not yet manage to test the multi-site setup using multiple domains -yet - since i dont have a development environment on our ftp strato server. If it breaks, it’s broken :frowning: Not sure if testing this functionality works locally without a proper vagrant environment :smiley:


#2

To analyse this question I have added some code to a template of a new installed Kirby langkit (version 2.3.2) on Win10 - XAMPP 7.0.8, which previously runs without any problem:

    <ul>
<?php foreach($site->languages() as $p1): ?>
      <li>language p1 = "<?php echo html($p1) ?>"
        <ul>
<?php foreach($site->languages() as $p2): ?>
          <li>language p1 - p2 = "<?php echo html($p1) ?> - <?php echo html($p2) ?>"</li>
<?php endforeach ?>
        </ul>
<?php endforeach ?>
      </li>
    </ul>

If I run the corresponding webpage it only shows the first foreach item “p1” with the default language:

Something is wrong.


#3
<?php

$ignore = array('sitemap', 'error');
$sitemap = $pages->index();
$languages = $site->languages();

// send the right header
header('Content-type: text/xml; charset="utf-8"');

// echo the doctype
echo '<?xml version="1.0" encoding="utf-8"?>';

?>

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
  xmlns:xhtml="http://www.w3.org/1999/xhtml">
<?php foreach($sitemap as $p): ?>
    <?php if( (in_array($p->uri(), $ignore)) || ($p->IsInvisible() && !$p->IsHomePage())) continue ?>
    <?php foreach($languages as $sitelang): ?>
      <url>
        <loc><?php echo $p->url($sitelang->code()) ?></loc>
        <?php foreach($languages as $pagelang): ?>
          <xhtml:link
             rel="alternate"
             hreflang="<?php echo $pagelang->code() ?>"
             href="<?php echo $p->url($pagelang->code()) ?>"
             />
        <?php endforeach ?>
      </url>
    <?php endforeach ?>
  <?php endforeach ?>
</urlset>

This should give us the right structure. But when i view the sitemap i only see one translation for the <?php foreach($languages as $sitelang): ?> loop, am i missing something here? I dont see the ‘objects’ for the english translation…

Thanks. Also: Is it enough for google to make this sitemap available under site.com/sitemap ?


#4

It’s recommended to make it available at site.com/sitemap.xml as well as to point to it in your robots.txt file. Google and other search engines even have a page where you can tell them where the sitemap is.


#5

Hey @Thiousi, do you think i need a sitemap for the english translation as well or is one sitemap enough since it includes the values for de and en url’s. But this sitemap above, as suggested by google, doesnt have a modified or priority attribute, which confuses me a little :slight_smile:

Do i need a second url for the sitemap? e.g: site.com/en/sitemap.xml ? Or is site.com/sitemap.xml enough with the template shown above.

For a single language website it’s straight forward to setup, but havent found much information about multi-language setups :slight_smile:


#6

It’s my understanding that cross-site submissions of sitemaps are allowed by Google only if the sites are verified in the search console, don’t know how that affects a multi-lang site. (source)

I think for your use case, I would do the following:

  1. A single language sitemap for your default language
  2. Using hreflang attributes to indicate variations of the same page in all languages.

Why? Because hreflangs are used by robots when crawling to build the equivalent of a multi-lang sitemap. We built a tutorial on how to use hreflang on macotuts.com along with pre-built code. I’d be interested to hear whether the returned hreflang tags work in your setup. Feel free to start a PM chat if you want to :wink:


#7

Reading your tutorial right now :blush: reporting in later :slight_smile: !

PS: site is verified in the webmaster console with site.ch. with english translation under site.ch/en. But with Kirby 2.4 on the horizon i might just implement the new multi domain setup with site.ch and site.com and verify both domains with the google console.


#8

@distantnative, @texnixe, @lukasbestle:
It would be nice to get a statement to this problem:

and


#9

The structure of your code is a bit weird… But I don’t see why it wouldn’t work for you.

I’ve used this plugin in the context of a multi-lang setup and it works well:

Note: it returns a minimized sitemap which may look weird to you but makes no difference for search engines


#10

Would be great :slight_smile: but if it doesn’t work out i am testing @Thiousi’s solution in the meanwhile. (Thanks)

Now i have to wait for google to crawl our site to see if any errors come up … :slight_smile:


#11

Yes, I can confirm this, the double foreach loop does not work, but I can’t figure out why that is. If you exclude the first language from the second loop, it does work though, but then you would have to repeat the code in the second foreach loop for the first language. If I get round to it, I’ll test this in an older Kirby version.


#12

I’ve been fiddling around with this as well and i got it working, but the issue here is, that it’s really painfully slow. For 288 URLs in total it takes ~40s to generate the sitemap.xml here localy :snail:

Features

  • Works for both single and multi -language
  • Generates alternate hreflang entries for multi-language
  • Exclude by template
  • Mark certain pages (contact) important for higher priority
  • Style it with a assets/css/sitemap.xsl if present
<?php

// Based on https://github.com/thgh/kirby-plugins/tree/master/sitemap

kirby()->routes(array(
    array(
        'pattern' => 'sitemap.xml',
        'action'  => function() {

            $sitemap = '<?xml version="1.0" encoding="UTF-8"?>';
            if( file_exists(kirby()->roots()->assets() . DS . "css" . DS . "sitemap.xsl") ) {
                $sitemap .= '<?xml-stylesheet type="text/xsl" href="' . kirby()->urls()->assets() . DS . "css" . DS . "sitemap.xsl" . '"?>';
            }
            $sitemap .= '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">';

            if(!empty(site()->languages())) {
                foreach(site()->languages() as $language) {
                    foreach(site()->pages()->index() as $page){
                        $sitemap .= add_sitemap_url($page, $language);
                    }
                }
            } else {
                foreach(site()->pages()->index() as $page){
                    $sitemap .= add_sitemap_url($page);
                }
            }
            $sitemap .= '</urlset>';

            return new Response($sitemap, 'xml');

        }
    )
));

function add_sitemap_url($page, $language = null){

    $exclude = c::get('sitemap.exclude', array('error'));
    $excludeTemplates = c::get('sitemap.exclude.templates');
    $important = c::get('sitemap.important', array('contact'));


    if($page->isHomePage() or !in_array($page->uri(), $exclude) and !in_array($page->template(), $excludeTemplates)){
        $url  = '<url>';
        $url .=     '<loc>' . ($language ? html($page->url($language->code())) : html($page->url())) . '</loc>';
        $url .=     '<lastmod>' . $page->modified('c') . '</lastmod>';
        $url .=     '<priority>' . (($page->isHomePage()||in_array($page->uri(), $important)) ? 1 : 0.95/$page->depth()) . '</priority>';

        if($language){
            foreach (site()->languages()->keys() as $lang => $code){
                $url .=     '<xhtml:link rel="alternate" hreflang="' . $code . '" href="' . html($page->url($code)) . '"/>';
            }
        }

        $url .= '</url>';
        return $url;
    }
}

But as I mentioned above, it’s so slow that’s it’s unusable.


#13

Update on the initial problem. In the past the sitemap has worked flawless. Here is the code:

<?php

$ignore = array('sitemap', 'error');
$sitemap = $pages->index();
$languages = $site->languages();
$language = (string) $site->language();

// send the right header
header('Content-type: text/xml; charset="utf-8"');

// echo the doctype
echo '<?xml version="1.0" encoding="utf-8"?>';

?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <?php foreach($pages->index() as $p): ?>
    <?php if( (in_array($p->uri($language), $ignore)) || ($p->IsInvisible() && !$p->IsHomePage())) continue ?>

  <url>
    <loc><?php echo html($p->url($language)); ?></loc>
    <lastmod><?php echo $p->modified('%Y-%m-%d') ?></lastmod>
    <priority><?php echo ($p->isHomePage()) ? 1 : number_format(0.5/$p->depth(), 1) ?></priority>
  </url>
  <?php endforeach ?>
</urlset>

but today i received numerous messages and warnings from google regarding a few errors. One of them was this:

Using the link /sitemap or /sitemap.xml - works as expected and displays proper xml. Any thoughts / help?

Thanks!! :slight_smile:


#14

I’d be interested in this, too.