sitemap.xml in a wagtail-bakery Site

Note: This site is no longer built with the technique described below.

sitemap.xml is a very important file for managing how search engines understand your web site. I won't include any background on the file itself, but here are some links I find helpful:

For my static site (see Baking a Static Site from Wagtail CMS) I've decided to maintain a sitemap including all of my pages.


sitemap.xml in a Dynamic Wagtail Site

Wagtail comes with a thin wrapper around Django's sitemap framework, both of which dynamically create a sitemap on request. It's easy enough to set up. Just follow the Wagtail docs to add a URL pattern that uses a builtin view:

# urls.py
from wagtail.contrib.sitemaps.views import sitemap

urlpatterns = [
    # ...
    path('sitemap.xml', sitemap),
    # ...
]

Now you can query the development server for a sitemap:

$ # in one terminal, start the development server
$ ./manage.py runserver

$ # in another terminal, fetch the sitemap
$ curl http://127.0.0.1:8000/sitemap.xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url><loc>https://www.joelsleppy.com/</loc><lastmod>2021-01-02</lastmod></url><url><loc>https://www.joelsleppy.com/blog/</loc><lastmod>2021-01-01</lastmod></url><url><loc>https://www.joelsleppy.com/blog/baking-a-static-site-from-wagtail-cms/</loc><lastmod>2021-01-02</lastmod></url><url><loc>https://www.joelsleppy.com/blog/syntax-highlighted-code-blocks-with-wagtail-cms/</loc><lastmod>2021-01-02</lastmod></url>
</urlset>

It's intelligent enough to find all the Wagtail pages and filter only the published ones (the unpublished draft I have of this blog post doesn't appear in the sitemap).

sitemap.xml in a Static wagtail-bakery Site

The only issue is that wagtail-bakery doesn't include sitemap.xml in our static site:

$ ./manage.py build
Build started
Build finished
$ ls build/
blog  index.html  media  static

This is because, following the wagtail-bakery README, you might have this in your project settings:

# settings.py
BAKERY_VIEWS = (
	'wagtailbakery.views.AllPublishedPagesView',
)

This picks up all published pages, but there is no Wagtail Page model for our sitemap so it gets left out of the build. Are there any other views that look like they might do the trick?

$ grep -e "^class" /path/to/your/virtual/env/lib/python3.8/site-packages/wagtailbakery/views.py
class WagtailBakeryView(BuildableDetailView):
class AllPagesView(WagtailBakeryView):
class AllPublishedPagesView(AllPagesView):

None of those look promising: they all have to do with Pages. It turns out that the BAKERY_VIEWS setting doesn't belong to wagtail-bakery, it belongs to django-bakery. Unfortunately django-bakery doesn't have any out-of-the-box solution or even any extension points to easily roll your own sitemap support (here's the open Github issue for that feature). Not to worry, we can draw inspiration from this pull request discussion:

# myproject/views.py
from bakery.views import BuildableTemplateView
from wagtail.contrib.sitemaps.views import sitemap


class SitemapTemplateView(BuildableTemplateView):

    build_path = "sitemap.xml"
    template_path = "anything"  # this is not used

    def get(self, request):
        return sitemap(request)
# settings.py
BAKERY_VIEWS = (
    "wagtailbakery.views.AllPublishedPagesView",
    "myproject.views.SitemapTemplateView",
)

Now our build includes the sitemap:

$ ./build.sh
...
$ ls build
blog  index.html  media  sitemap.xml  static
$ cat build/sitemap.xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
...
</urlset>

Telling Google about your sitemap.xml

Submitting the sitemap to Google couldn't be easier. They have a web interface for this (see all the options for submitting your sitemap), but in order to automate this in your deployment script you should add a line like this:

# deploy.sh
curl 'http://www.google.com/ping?sitemap=https://www.joelsleppy.com/sitemap.xml'

Cheers!