How to extend SXA sitemaps and a few reasons why you'd want to / by Derek Hunziker

keep_calm.png

I often tell people that the least interesting part of the Sitecore Experience Accelerator is the toolbox. At least, it is to me anyways. Yet, despite the wide array of functionality, the toolbox of reusable components is what most people think of when they think SXA. Sadly, I feel that there is so much more to SXA that is often overlooked. One such feature is SEO, whereby, SXA sets you up with a solid foundation for creating a search-engine-friendly website with almost no developer involvement required.

I say “almost” no developer involvement, because there are certain circumstances when a packaged solution will not work for everyone. Case in point, my team needed to make some customizations to the way the XML sitemap was generated to meet our specific needs. Thankfully, the process was a breeze and one that I hope to share with you today.

Why Customize?

The decision to customize a Sitecore solution is something that should be carefully considered. For our team, there were a few reasons why we decided to embark on customizing the output of our sitemap - they weren’t made lightly. That being said, these reasons won’t apply to everyone, as they pertain specifically to our implementation and set of requirements, but they’re still worth sharing nonetheless.

  1. We wanted to exclude pages that were marked as NOINDEX

  2. We wanted to exclude pages that were falling back to English

  3. SXA’s architecture provided a clean, straightforward process for applying customizations

For reason #1, one of our developers found that SXA already provides a mechanism to exclude pages from the sitemap at the item-level. It’s not immediately apparent, but you can set the “Change frequency” field to “do not include” on any page item and it will be excluded from the sitemap. With a bit of author training, it’s a setting that can be incorporated into most authoring workflows. On the other hand, it’s common for page templates to include additional SEO settings, such as the ability to mark a page as “NOINDEX” to search crawlers. In this case, when an author selects the NOINDEX option, we wanted to exclude the item from the sitemap without any additional steps. As it turns out, search engines don’t like being told to index a page, only to find that the page is marked as NOINDEX. It sends mixed messages and it’s just not cool.

Reason #2 is along the same lines as #1. In this case, we didn’t want to advertise URLs in the sitemap that weren’t fully translated. So, if a page is falling back to English, there is really no reason to tell search engines to include it in their indexes. SXA 1.6+ already contains logic to include all language versions of a page in the sitemap as “hreflang” entries, however, it needed some tweaking to meet our specific requirements.

As for reason #3, it was clear to us that SXA was designed with customization in mind. With a bit of research into how things worked, we were confident we could apply these customizations in a way that wouldn’t hurt us down the road.

How to customize?

The default SXA sitemap generator implementation is registered under the sitecore/services configuration node. If this is new to you, I encourage you to learn more about Sitecore’s dependency injection features.

<register serviceType="Sitecore.XA.Feature.SiteMetadata.Sitemap.ISitemapGenerator, Sitecore.XA.Feature.SiteMetadata" implementationType="Sitecore.XA.Feature.SiteMetadata.Sitemap.SitemapGenerator, Sitecore.XA.Feature.SiteMetadata" lifetime="Transient" patch:source="Sitecore.XA.Feature.SiteMetadata.config"/>

Conveniently, this provides a nice clean way to point to your own custom implementation. As long as your sitemap implementation inherits ISitemapGenerator, you’re in business.

In our case, the changes we attempted to make didn’t require a completely new implementation. We only needed to subclass the existing Sitecore.XA.Feature.SiteMetadata.Sitemap.SitemapGenerator class and override a couple of methods. It’s worth noting at this point that the SXA developers aptly make almost every method of this class virtual. So, let’s all pause to give a round of applause for the SXA development team!!

giphy.gif

Next, in order to exclude items from the sitemap based on our new criteria, we needed to override the ShouldBeSkipped(Item item) method, which returns a boolean value representing whether or not to skip adding the item to the sitemap. Simple! Before calling the base implementation, we injected our custom logic to ensure the pages marked as NOINDEX were indeed getting skipped.

protected override bool ShouldBeSkipped(Item item)
{
    // Custom logic goes here
    
    return base.ShouldBeSkipped(item);
}

Lastly, in order to control which language versions appear in the sitemap, we overrode the GetItemsForOtherLanguages(Item item) method, which is responsible for producing a list of language-specific items to include along with the default URL. The default SXA implementation was already checking whether or not a language version existed, using item.Version.Count > 0, however, this did not handle fallback scenarios, so we added a simple check for item.IsFallback to account for this, then we were off to the races.

Hopefully this shows just how easy it is to extend the default SXA sitemap generator. Happy overriding!