How to Improve Your Crawl BudgetHow to
The search traffic is commonly referred to as the best free traffic source available. You create great and optimized content, it gets crawled by search engines, and eventually starts to rank. The entire process is straightforward on the surface and most people wouldn’t need to worry about the nuances involved in crawling a website. The question is, when should you be worried about how Google crawls your website?
That’s a good question. In this article, I’ll look at what crawl budget is, when and why it matters, and a few ways you can improve your crawl budget, so the right pages are indexed.
1. What Is a Crawl Budget?
Crawl budget is the number of URLs on your website Googlebot can crawl. There are two factors to consider when trying to understand your website’s crawl budget:
- The crawl rate limit which is how quickly Google can crawl your website.
- The crawl demand which is a measure of how important Google thinks your URLs are.
These factors, taken together, make up your crawl budget. If the crawl rate limit is set extremely high and the crawl demand is also high then you may run into issues with server resource allocation. Intense Googlebot activity can create a poor experience for your website visitors. It’s important to note that this shouldn’t be an issue for you if your website has less than 1,000 URLs available to be crawled.
The crawl budget becomes important when you have a large website or a website that generates a lot of URLs such as an E-commerce store with filtering search function.
2. Ways to Improve Your Crawl Budget
You don’t have control over Googlebot but you do have control over how it interacts with your website. There are a number of factors that affect your crawl budget. When they’re optimized, you can improve your overall crawl budget and make sure the right pages are indexed.
2.1. Speed Up Your Website
This is something you should be doing irrespective of crawl budget. A faster website produces a better user experience for your visitors. Googlebot tries to do its work as quickly as possible without taking up too much server resources. If it notices a website is slowing down, it reduces the crawling process or stops it altogether. If this is the case, important pages could be skipped entirely.
There are a few quick fixes for a slow website:
- Implement a CDN which will help Googlebot access resources faster by serving them from a location close to it.
- Compress files and images, so pages are smaller and load more quickly.
- Invest in better hosting. It’s possible to make all the optimizations in the world but not see significant gains because of subpar hosting.
- Set up browser caching.
1. Read more about how to speed up your website.
2. Learn more about how to improve your metrics in Google PageSpeed / LightHouse.
2.2. Make Important Pages More In-Depth
It’s almost common knowledge by now that Google gives preference to more in-depth content. Its job is to show searchers the best content possible.
A piece of content is more likely to satisfy a searcher if it’s longer and contains images and other rich media. Of course, it still needs to be good.
The higher the word count on a page, the more likely it is to be crawled and indexed. Thin pages, under 300 words, are indexed less than 20% of the time. There are many fixes for thin content but a viable option is to no-index them entirely – especially when they’re not essential. For important content, make it longer and more detailed.
Read more about how to identify and fix thin content.
2.3. Remove Infinite Spaces
Infinite spaces are places on your website that can, technically, go on forever. For example, if you have a booking page with a calendar, a visitor can click the next button as long as they want. With Googlebot, this is detrimental and consumes your crawl budget on unimportant pages or content.
To prevent this from occurring, use the robot.txt file to eliminate dynamic categories that can go on forever. Nofollow assets like a calendar and use the URL Parameters Tool provided by Google to format links, so Googlebot can ignore unimportant ones properly. Here’s a detailed explanation about when and how to use it.
2.4. Eliminate 404 Errors
A soft 404 occurs when a server uses the 200 OK response code for a page that no longer exists. In reality, it’s supposed to send back a 404 Not Found response code.
The problem with soft 404 errors is that Googlebot attempts to crawl and index the page instead of portions of your website with unique content. Eventually, Googlebot will move on to another website and your pages may not be indexed in a timely manner.
Crawl your website with Netpeak Spider to detect soft 404 errors. If they’re present, either redirect them to the correct page or fix the response code the server returns. When finished, fetch the page as Google through search to make sure the changes have propagated.
Crawl budget is an important factor in your overall business strategy, but it’s not something to lose sleep about early on. Take it into consideration when you’ve built a large collection of URLs or your website generates dynamic content during the normal course of operation. If that’s you then implement the strategies outlined in this post to improve your crawl budget and ensure the most important pages are indexed by Googlebot.