So you’ve just written your magnum opus of web applications, a scaleable wonder with dynamic content everywhere. Some of this content may only be accessible through AJAX requests, POST submissions, etc. and yet you want your entire site to be indexable by search engines. What do you do? You write a sitemap.xml file.
Django is a very powerful Python web framework with an over–abundance of available third–party extensions. One—conveniently built into Django itself—includes a tool which generates specification–conformant XML files for consumption by Google, Bing, Yahoo!, and all of the other popular search engines. It allows you to automatically detect static views within your Django application, manually specify entries, and even dynamically pull entries from a datastore. All you have to write is an adapter for each type of thing you wish to publish and hook these into your URL map.
Getting Started
In order to use the site map framework you will need to register it in your settings.py file as an INSTALLED_APP and make sure you have the required template loader registered. Here’s a relevant snippet of code:
# encoding: utf-8 DEBUG = True ADMINS = [('Alice Bevan-McGregor', 'alice@matchfwd.com')] # ... SNIP ... TEMPLATE_LOADERS = ( 'django.template.loaders.filesystem.Loader', 'django.template.loaders.app_directories.Loader', # --- add this ) django.template.loaders.app_directories.Loader INSTALLED_APPS = [ # ... 'django.contrib.sitemaps' # --- add this ] |
Mapping the Site Map View
Once you have the site map application registered, we now need to tell your application to route /sitemap.xml requests to it. First, create a new module in your application—we called ours matchfwd.sitemap—then modify your root urls.py file to add the lines:
# encoding: utf-8 from django.conf import settings from django.conf.urls import patterns, url, include from matchfwd.sitemap import sitemaps # --- add this with your own module name # ... SNIP ... urlpatterns = patterns('', # ... ( # --- add this tuple r'^sitemap\.xml$', 'django.contrib.sitemaps.views.sitemap', {'sitemaps': sitemaps} ) ) |
Defining a Site Map
Now that we have our sitemaps object being imported from our—currently empty—sitemap module let’s backtrack and fill in the missing details from that module. First let’s write up two Sitemap subclasses; one for manually configured static views and the other for some dynamic content. The header of our sitemap module should look something like:
# encoding: utf-8 from django.contrib.sitemaps import Sitemap # import your own data structures here from matchfwd.company.models import Company |
Now we define our first Sitemap subclass:
class StaticSitemap(Sitemap): priority = 0.5 lastmod = None def items(self): return [ "/", "/about", "/press", # ... ("/opportunities", "daily"), ("/people", "daily") ] def location(self, obj): return obj[0] if isinstance(obj, tuple) else obj def changefreq(self, obj): return obj[1] if isinstance(obj, tuple) else "monthly" |
There are quite a number of class attributes used to generate the final XML. I’ll describe below the ones we’ve seen so far, but a complete reference is available. Note that these attributes may be static, such as priority, or dynamic methods like location.
- priority
- A floating–point number between 0.0 and 1.0 indicating relative priority of the pages. Here we hard-code the default priority of 0.5 for all static pages.
- lastmod
- The last modification date of the pages. Determining the last modification date of the template files themselves is left as an exercise for the reader. (Hint: os.stat)
- location
- Because our objects are strings or tuples the location is either the string itself or the first value from the tuple. This lets us override the next value more easily.
- changefreq
- This represents roughly how often the page in question is updated. We default to monthly, but allow individual pages to override this default such as the people page which is updated daily.
The Django site map framework can work out your static pages for you, but then you can’t define lastmod, changefreq, or priority. Now let’s create that dynamic data–driven sitemap:
class BaseSitemap(Sitemap): priority = 0.5 changefreq = "weekly" def location(self, obj): return obj.get_absolute_url(False) def lastmod(self, obj): return obj.modified or obj.created class JobOpportunitySitemap(BaseSitemap): changefreq = "daily" def items(self): return JobOpportunity.objects.filter( is_private=False, state=u'CONFIRMED' ) |
Because matchFWD has a lot of dynamic data we want indexed we use the BaseSitemap class above so that we only have to override the values that change from one database model to the next; usually just the query! The job of finding the correct URL to view an object at is delegated to our model classes which define a get_absolute_url method. The first argument to this method determines if we generate a full URL with protocol and domain or only the path part of the URL. Sitemaps require you use a path, not a full URL! We also have modification and creation dates on every model so factoring out these makes sense. The second class is a site map specific to job opportunities posted on the site.
Now that we have our site maps, let’s combine them.
sitemaps = dict( static = StaticSitemap, jobs = JobOpportunitySitemap, # ... ) |
This is the object being imported in our urls.py file to be passed to the sitemap view. After saving everything and starting your site locally you should be able to visit http://localhost:8000/sitemap.xml and see the result. Ours starts off something like:
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://matchfwd.com/</loc>
<changefreq>monthly</changefreq>
<priority>0.5</priority>
</url>
<!-- ... -->
</urlset>
One last thing…
Well, there are three things, actually:
- You’ll need to tell the search engines about your
sitemap.xmlfile. You can manually register with Google, but there is a better way… and one that works across search engines. - Add a reference to the site map in your
robots.txtfile. The syntax is very simple, just add a line like the following:Sitemap: http://matchfwd.com/sitemap.xml
- Now that you have a site map, you might want to let Google know when things get updated. You can do this using the ping_google function. Remember you’ll have to register with Google Webmaster Tools first!