In this post, we’re going to cover specifically, XML sitemap basics, including what is an XML sitemap? Why do I need one? How do I create one and what do I do with it when I have it?
There are other types of sitemaps that are useful for both SEO, visitor usability and accessibility, such as HTML sitemaps and Hreflang sitemaps. These have different purposes and benefits (to each other, as well as XML sitemaps) so will be addressed in related articles.
Let’s first define what XML is and why it is this aspect of the sitemap that is key to the point of the exercise.
XML in this sense means an Extensible Markup Language formatted file. A format that contains in this case tags, that means something explicit to a search engine crawler.
What is an XML Sitemap?
In the simplest form, it is a list of all the URLs that exist on your website, that you would like a search engine crawler to know about, and index. For a simple blog type of website, it might look like this…
Using XML we are able to add markup to describe attributes about the content within the map, providing more detail to search engine crawlers in a way that can be processed quickly. This could be the URL, the date of last modification to the content, content type.
If you have different types of content, architecture and mark-up on your site it can often be helpful to have specific XML sitemaps for each of these analogous types. In such cases, you can use the same protocol to create multiple sitemaps and an index of the list of sitemaps, such as:
Why Do I Need One?
There are several reasons to have an XML sitemap and the benefits become greater the bigger and more complex the items [on your website] that you desire to be indexed.
Sitemap protocol is utilised by most search engines, including Google, Bing and Yandex. Using a conventional URL suffix for your sitemap – such as /sitemap.xml aids easy discovery of your sitemap and all that it contains, contributing to your chances of content discovery and prompt indexing.
Search engine crawlers are directed to identify new content to contribute to the respective indices. Using a recognised format like XML Sitemaps makes the crawler objective somewhat simpler. Ensure that your sitemap only contains items that you want to be indexed. Including items in your sitemap that may be no-indexed at page level or have a disallow robots.txt reference may open a margin for error.
- Content/URLs in the index that should not be there
- Crawl loops from redirections
- At scale may lead to crawl routes being inhibited and less chance of full indexing
Using the <lastmod> tag allows us to provide information on when items in our sitemap have been changed. As well as identifying new content, search engines want to update changes to existing indexed content. Providing more current up to date content adds value to search engine users. This can be illustrated by thinking about news sites which may continually update developing news stories. Utilising <lastmod> (providing you do change your content frequently) can increase the frequency by which our pages are crawled and therefore updated in indices.
There are multiple tools available, including Google Search Console and Bing Webmaster Tools, as well as SEO professional tools such as RYTE that can analyse your site, providing feedback for improvement including any disparity between your XML sitemap content and your actual site crawlable and indexable items. More on this in the section below… “What Do I Do With My XML Sitemap?”
How Do I Create an XML Sitemap?
When creating an XML sitemap or maps it is most efficient of possible to create a dynamic rather than static version where possible.
Dynamic XML Sitemaps
Dynamic XML sitemaps update automatically as changes such as updates or additions occur into the site and tend to be driven by the CMS capabilities. A dynamic sitemap is preferable as if configured correctly from the get-go, tend to require little maintenance or attention from you as SEO or webmaster.
Our favourite WordPress dynamic sitemap generator is the in the Yoast SEO plugin for example. However many widely used Content Management Systems will have a dynamic sitemap feature you can utilise easily.
If your CMS does not have this functionality you can still create an XML sitemap yourself. If this is your only option, then according to the frequency at which your site content changes you should also establish regular maintenance practise for this sitemap too.
Use a crawling tool like Screaming Frog to identify all the indexable URLs on your site, and use the Sitemaps feature to convert to XML.
What Do I Do With My XML Sitemap?
Once you have generated your XML sitemap you need to upload it to your website so that it is accessible to search engine crawlers. To aide indexing; utilise SEO tools to help index, process and monitor URL health.
Where Do I Put My XML Sitemap On My website?
Upload your XML sitemap to the root so that it is something like this:
It is also a good idea to reference your sitemap URL in your robots.txt file as this is frequently visited by crawlers.
What Can I Do to Monitor My XML Sitemap?
Once you have uploaded your sitemap to the root there’s a number of ways to monitor the health. By health, we mean the number of URLs that continue to be live and returning a 200 response code.
Google Search Console
In the left-hand menu, under the Index section, click on Sitemaps. Here you can provide the URL of your sitemap and notify Google of any other, such as by category or type.
Even if your sitemap is dynamic, you can still re-submit it here, if you’ve added new content since the last time Google accessed your sitemap. This is one of several ways to help get your content indexed quickly. However as with many aspects of how Google handles requests and instructions other criteria contribute to how soon our request may be processed. As you can see in the image above we have re-submitted our sitemap on April 11th, but it has not yet been read since April 9th!
By clicking the little bar graph icon on the far left you can then access further detail as to how many of your URLs are valid, and if there are any errors (which may be URLs that 404) or other exclusions.
RYTE is a professional SEO tool and a paid-for solution perfect for agencies and larger sites. Whilst we use many tools at Erudite, we particularly like the sitemaps monitoring and insight provided here.
Getting specific information from Google Search Console can be a little obfuscating, however, RYTE provides detailed analysis to help you pinpoint exactly which aspects of your sitemap(s) can be correct or updated.
All of the aforementioned tools can help you keep on top of both static and dynamic sitemaps types. However, if your sitemap is static you must also establish a more formal manual process.
Depending on content and change frequency, set a rigid process, get it diarised and stick to it. This could be a weekly re-run of the Screaming Frog Sitemap generator tool mentioned previously. Just remember to re-upload the new file to replace the existing (on your site) and then re-submit in Search Console.
If you’re not sure how to complete XML sitemap process above, or if you could use a little expert help then please get in touch. At Erudite we have worked with thousands of sites for over a decade and can take quick and easy jobs like this off your hands, allowing you to focus on driving your online business.