Note: This site is no longer built with the technique described below.
sitemap.xml
is a very important file for managing how search engines understand your web site. I won't include any background on the file itself, but here are some links I find helpful:
- the Sitemap protocol describes the format of
sitemap.xml
- Google gives some reasons why you may want a sitemap
For my static site (see Baking a Static Site from Wagtail CMS) I've decided to maintain a sitemap including all of my pages.
sitemap.xml in a Dynamic Wagtail Site
Wagtail comes with a thin wrapper around Django's sitemap framework, both of which dynamically create a sitemap on request. It's easy enough to set up. Just follow the Wagtail docs to add a URL pattern that uses a builtin view:
# urls.py
from wagtail.contrib.sitemaps.views import sitemap
urlpatterns = [
# ...
path('sitemap.xml', sitemap),
# ...
]
Now you can query the development server for a sitemap:
$ # in one terminal, start the development server
$ ./manage.py runserver
$ # in another terminal, fetch the sitemap
$ curl http://127.0.0.1:8000/sitemap.xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url><loc>https://www.joelsleppy.com/</loc><lastmod>2021-01-02</lastmod></url><url><loc>https://www.joelsleppy.com/blog/</loc><lastmod>2021-01-01</lastmod></url><url><loc>https://www.joelsleppy.com/blog/baking-a-static-site-from-wagtail-cms/</loc><lastmod>2021-01-02</lastmod></url><url><loc>https://www.joelsleppy.com/blog/syntax-highlighted-code-blocks-with-wagtail-cms/</loc><lastmod>2021-01-02</lastmod></url>
</urlset>
It's intelligent enough to find all the Wagtail pages and filter only the published ones (the unpublished draft I have of this blog post doesn't appear in the sitemap).
sitemap.xml in a Static wagtail-bakery Site
The only issue is that wagtail-bakery doesn't include sitemap.xml
in our static site:
$ ./manage.py build
Build started
Build finished
$ ls build/
blog index.html media static
This is because, following the wagtail-bakery README, you might have this in your project settings:
# settings.py
BAKERY_VIEWS = (
'wagtailbakery.views.AllPublishedPagesView',
)
This picks up all published pages, but there is no Wagtail Page
model for our sitemap so it gets left out of the build. Are there any other views that look like they might do the trick?
$ grep -e "^class" /path/to/your/virtual/env/lib/python3.8/site-packages/wagtailbakery/views.py
class WagtailBakeryView(BuildableDetailView):
class AllPagesView(WagtailBakeryView):
class AllPublishedPagesView(AllPagesView):
None of those look promising: they all have to do with Page
s. It turns out that the BAKERY_VIEWS
setting doesn't belong to wagtail-bakery, it belongs to django-bakery. Unfortunately django-bakery doesn't have any out-of-the-box solution or even any extension points to easily roll your own sitemap support (here's the open Github issue for that feature). Not to worry, we can draw inspiration from this pull request discussion:
# myproject/views.py
from bakery.views import BuildableTemplateView
from wagtail.contrib.sitemaps.views import sitemap
class SitemapTemplateView(BuildableTemplateView):
build_path = "sitemap.xml"
template_path = "anything" # this is not used
def get(self, request):
return sitemap(request)
# settings.py
BAKERY_VIEWS = (
"wagtailbakery.views.AllPublishedPagesView",
"myproject.views.SitemapTemplateView",
)
Now our build includes the sitemap:
$ ./build.sh
...
$ ls build
blog index.html media sitemap.xml static
$ cat build/sitemap.xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
...
</urlset>
Telling Google about your sitemap.xml
Submitting the sitemap to Google couldn't be easier. They have a web interface for this (see all the options for submitting your sitemap), but in order to automate this in your deployment script you should add a line like this:
# deploy.sh
curl 'http://www.google.com/ping?sitemap=https://www.joelsleppy.com/sitemap.xml'
Cheers!