XML Sitemap Integration

Devin Poole shared this idea 4 months ago
Planned

Is there a way to integrate the website crawl and XML sitemap crawl together. I'm looking to compare the data to identify issues such as:

1. Pages only found in the crawl (not in the sitemap)

2. Pages only found in the sitemap (no in the crawl)

3. Non-indexable pages that are in the sitemap (blocked by robots.txt, have noindex tag)

4. Broken links in the sitemap

5. Redirects in the sitemap

Comments (4)

photo
1

Also is there a way to crawl multiple sitemaps now (for site that use index files)?

photo
2

Hey Devin,

Thanks for your message! Let me address your questions below:

1. In the current version, it's not possible to compare the results of the entire website crawling and sitemap crawling. We have such plans for v. 2.2.

2. When you choose 'XML Sitemap' crawling mode, a separate window with the results is opened where you can see the sitemap file issues. If the URLs in your sitemap are not disallowed, then they are crawled for the common issues and you can see the results by closing 'XML Sitemap Overview' window.

For instance, I was crawling Apple sitemap file. When I close the sitemap overview window, I see the results of crawling of all the allowed URLs in the file.

UPP1YH

In this results table, you'd be able to see all the possible issues detected by Netpeak Spider. Among them are the ones you've mentioned – broken links, redirected pages, and non-indexable pages.

As for crawling sitemap index files, it's possible in the present version of Netpeak Spider. The tool checks all sitemap files for the issues and then all the URLs in these files. With the help of 'Data Type' Column in XML Sitemap overview, you could know whether it is a URL, a sitemap or sitemap index file. Also, there is a 'Parent URL' column that helps understand which sitemap the particular URL belongs to.

UPQVZJ

Hope this helps.

If there are any questions left, just drop me a line and I'll be glad to help.

photo
1

Thanks Amber.

I really appreciate the detailed info. Very helpful!

Any idea when v. 2.2 will be released?

photo
1

Hey Devin,

Thanks for your response.

We expect to release Netpeak Spider 2.2 in January / February 2017. We'll inform you the moment it happens.

If there is anything else I can help you with, feel free to contact me.

Have a great day!