What Is XML Sitemap? How to Create and Validate It

How to
2Like
Comments
Share
What Is XML Sitemap? How to Create and Validate It

An XML sitemap is a text file used to manage the activity of search engine robots on your website. It lists important pages of your website that need to be crawled. Also, an XML sitemap helps search robots understand your website structure. If there is no XML sitemap or it has any errors, it can cause issues with content indexing, especially for large websites.

Further down this blog post, we will explain the fundamental technical requirements for XML sitemaps, show how to create them in Netpeak Spider, and quickly check the errors in the existing files.

  • 1. What Is an XML Sitemap?
  • 2. Technical Requirements for XML Sitemap
  • 3. How to Create XML Sitemap
  • 4. How to Make Sitemap Available to Google
  • 5. How to Check if Your Sitemap Is Working
  • To Recap
Feel like bogging down in a billow of SEO topics? Sign up to receive our weekly newsletter with nonperishable cases and nifty how-to's.

Sign Up

1. What Is an XML Sitemap?

XML sitemap provides crawlers with the list of web pages to index. It means that you’ll keep search robots tuned about updated content and make them crawl it as soon as possible. A sitemap can improve the crawling of your website, especially if it’s large and complex.

Before creating your sitemap, make sure that you need an XML sitemap as for the majority of web platforms, it will be enough to use an HTML sitemap.

A simple HTML sitemap with robots.txt file will do for small websites (500 pages or less) with rarely updated content, few media files, newspages, or JavaScript, etc. Also, if your website is intelligently linked internally, you don’t have to worry that search robots will fail to follow important pages from the homepage.

Let’s look more closely at the cases when XML sitemap is important for the project:

  1. Your website is large and is frequently updated, so the search engine bots can simply miss the newly added content.
  2. The site has a complex structure. It’s swamping with a number of individual pages with page’s depth 3 or more.
  3. The website is built with the help of technologies like JS, Flash, Ajax, and others that create indexing obstacles.

The sitemap file, designed for such projects, contains the following information:

  • the <urlset> tag:

    XML sitemap contains this information

  • a URL, an update date:

    XML sitemap contains this information

  • URL in the <loc> XML tag is written fully while the other tags are up to you to mention.
  • The date of the previous page changes in the <lastmod> value. Remember to represent this value correctly since Google search robots are sensitive to any errors in sitemaps.
  • Note that Google doesn’t currently follow the <priority> and <changefreq> values in sitemaps, which were used to point to priority pages and set the change frequency (planned frequency particular web page will be updated with new content).
  • In the end, the sitemap file is closed with the </urlset> tag.

2. Technical Requirements for XML Sitemap

  1. The sitemap can be built either in the XML or TXT format. At that, you should encode the file using the encoding="UTF-8" code.
  2. Don’t close the pages that are listed in sitemap from indexing in robots.txt.
  3. Pay attention to the server response status code that plays an important role: 200 OK is the only right one. To quickly check the server response codes, use Netpeak Spider:

    To quickly check the server response codes, use Netpeak Spider

  4. Don’t close an XML sitemap from indexing in the robots.txt file.
  5. A single XML Sitemap is limited to 50000 URLs, where the maximum size of a file should not exceed 50 MB. These restraints are typical for Google. So if you own a project with larger archives, its sitemap should be divided into two separate parts within the suggested limits.
  6. The sitemap should contain only canonical version of each URL (to avoid duplicates).
  7. Mention alternate language versions of a URL using hreflang annotations.
  8. The URLs can be written in an arbitrary order. Google won’t follow the order anyway.
  9. Since images and video content are difficult to crawl, Google suggests using extensions to describe them to improve indexing.
  10. The sitemap is usually stored in the root folder of your domain. The document is available at http://site.ru/sitemap.xml.

Read more → Sitemap generator

3. How to Create XML Sitemap

There are several ways to create an XML sitemap:

  • Decide which URLs of your website should be crawled by Google and create the file manually.
  • Use third-party tools or specialized plugins. We will show how to build an XML sitemap in Netpeak Spider.

You can generate and validate the XML sitemap even in the free version of Netpeak Spider crawler that is not limited by the term of use and the number of analyzed URLs. Other basic features are also available in the Freemium version of the program.

To get access to free Netpeak Spider, you just need to sign up, download, and launch the program 😉

Sign Up and Download Freemium Version of Netpeak Spider

P.S. Right after signup, you'll also have the opportunity to try all paid functionality and then compare all our plans and pick the most suitable for you.

3.1. How to Create a Sitemap Manually

To build a small-sized XML sitemap, use any text editor. Create a file and name it ‘Sitemap.’ Then, change its extension to .xml and fill in the document with absolute URLs that should be indexed. Absolute URLs are https://netpeaksoftware.com/blog. But if your site is at https://www.lovepugs.com/, don't specify a URL as https://lovepugs.com/ (without www).

To build the sitemap, use the mandatory XML tags described earlier. If needed, fill in the optional ones. In the end, it looks like this:

To build the sitemap, use the mandatory XML tags

To build extensive XML sitemaps, some special coding skills are required. In this case, the sitemap generator script is added to CRON – a tool for regular automatic completion of specific tasks.

However, if you want to build XML sitemap a way too faster, use the third-party generators designed for such purposes.

3.2. How to Generate Sitemap in Netpeak Spider

Netpeak Spider is a powerful software specialized in solving various tasks of marketing, SEO, and web development teams. To generate an XML sitemap:

  1. Go to the ‘Tools’ → ‘Sitemap generator.’
  2. Tick in the ‘XML sitemap’ box and configure the sitemap parameters: last modified date, change frequency, and priority links (yes, you can still generate these values). Then hit the ‘Generate’ button and choose/create the target folder to save the sitemap.

Tick in the ‘XML sitemap’ box and configure the sitemap parameters in Netpeak Spider

More on how to work with the instrument you'll find in a user guide: 'Sitemap Generator.'

4. How to Make Sitemap Available to Google

When you generated a file, upload it to a website root folder. You can access the sitemap at domain.com/sitemap.xml later.

It’s also a good practice to add XML sitemap’s URLs to the robots.txt file. If you have many sitemaps, just add them all, each from a new line.

It’s a good practice to add XML sitemap’s URLs to the robots.txt file

The next step is to submit sitemap.xml to Google Search Console or simply ‘tell’ Google where to find a sitemap on your website. To handle this task, go to the ‘Index’ tab and select ‘Sitemaps.’ Then add ‘sitemap.xml’ to your website domain and click on the ‘Submit’ button.

Read more → Sitemap checker

5. How to Check if Your Sitemap Is Working

If your website already has an XML sitemap, but any issue occurs, check the existing file for errors in Netpeak Spider.

  1. Go to the ‘Tools’ → ‘XML sitemap validator.’ Insert the sitemap URL and press the ‘Start’ button:

Go to the ‘XML sitemap validator’ in Netpeak Spider and insert the sitemap URL and press the ‘Start’ button

In case of errors, the tool will display them, marking the severity level.

Netpeak Spider analyses the sitemap for compliance with the standard sitemap protocol requirements, which is supported by every search engine. More than 30 potential errors are typically inspected. If Netpeak Spider has detected any errors in the sitemap, fix them and revalidate one more time.

Useful tip: you can notify search engines about the updates in your sitemap directly from the ‘XML Sitemap validator’ tool.

you can notify search engines about the updates in your sitemap directly from the ‘XML Sitemap validator’ in Netpeak Spider

If you’re unsure how to troubleshoot the bugs, use the generator – it will be easier to create a new sitemap and then upload it to the hosting as it has been described earlier.

To Recap

  • It is important to have an XML sitemap which guides search engines robots across the links that are meaningful for indexing for large projects with a complex hierarchical structure.
  • To generate the XML sitemap, you are free to choose any approach of your liking: design the file on your own or involve the coders, install plugins, or use third-party tools and sitemap generators, such as Netpeak Spider.
  • To have the website content indexed regularly and quickly, check the sitemap for any issues.