You have added fresh new pages and content to your website. However, you find it hard to see them on search engines.
You know that it takes time and resources for Google to index and crawl your website and that you have to give an allowance of at least a week. You must ensure that Google understands and can locate your page before assuming there’s a problem.
If you have done your part, submitting your sitemap and submitting an index request, but your page is still missing, remember that Google will not rank and index every page that you have produced or published on your site. It tends to miss out on almost half of your website’s pages, especially if it is a large one.
What is Crawl Budget
A crawl budget is the maximum number of pages that search engines can and want to crawl on your website. It varies from day to day depending on your site’s health, size, and links you associate your content with.
Google weighs your crawl rate limit and the crawl demand to identify your crawl budget. If the pages on your website exceed your crawl budget, it will cause you to have pages left unindexed. Which means you will have a hard time ranking your page.
How does Search Engines Work?
Google, Bing, Baidu, Yahoo, and Yandex are some of the most popular search engines we have now.
But how do these search engines actually do their job for your page?
Search engines have three crucial functions: crawling, indexing, and ranking hundreds of billions of pages using their respective crawlers or spiders.
The search engine’s algorithm aims to present high quality and valuable search results to satisfy online users’ queries as fast as possible.
Once the user finds a reliable source of information based on the presented lists of search results, this action affects how search engines rank pages on the web in the future.
In SEO, not all search engines are equal.
You may be asking yourself, is there a need for me to optimise my page also for Bing, Yahoo, and others?
The answer is, the SEO community pays more attention to Google. Google has the largest market share and 90% of online users use this search engine to look for stuff that they need.
From images, maps, and to the use of social media platform, YouTube – an American online video sharing platform owned by Google – all searches as you can see happen here.
We recommend that when you would like to optimise your site to perform better than your competitors, think like Google.
How Crawler Works
Spiders, robots, and bots are all web crawlers, depending on what you are comfortable using. They crawl across the World Wide Web to index pages for search engines, as they can’t identify what websites exist on the web.
Search engines, like Google, need web crawlers to help browse the web and ensure that the accurate information – keywords or phrases – are accessible and useful to online users.
Crawlers explore pages through links. They get a list of URLs and obtain your site’s robots.txt file once in a while.
They are guided by policies when selecting which page to crawl, how frequently they should crawl them, especially if there are updates, and in what order to crawl these pages.
What are some examples of crawlers?
Googlebot is Google’s main crawler for mobile and desktop crawling. You may come across the following spiders:
- Baidu: Baidu Spider
- Bing: Bingbot
- DuckDuckGo: DuckDuckBot
- Yahoo!: Yahoo! Slurp
- Yandex: Yandex Bot
And other crawlers are less common because you cannot associate them with any search engines.
Why do search engines assign crawl budgets to websites?
The internet is home to thousands of websites. The crawl budget is assigned to every website because there are just too many websites out there; too few resources to crawl them all at once.
The ratio of websites to resources is overwhelming. Therefore, search engines allocate a budget to ensure that every website gets the crawling treatment it needs.
One of the determining factors of a crawl budget is crawl demand or crawl scheduling.
Crawl demand refers to the request for search engines to crawl or recrawl your page affecting new and old ones.
Popularity, relevancy, and the type of pages influence this factor because it signals which URLs are worth crawling or recrawling to prevent them from becoming stale in the index.
Crawl Rate Limit
One of the two factors that comprise a crawl budget is the crawl rate limit or host load limit.
Crawlers in search engines prevent web request’s overloading that may cause harm in the long run. As Google puts it, it represents the number of simultaneous parallel connections Googlebot may use to crawl the site, as well as the time it has to wait between the fetches, making it an essential part of the crawl budget.
These are some of the reasons that influence a crawl rate limit:
- Running a website under a shared hosting platform with another hundred websites: Host level determines crawl limit. This scenario shows that you have to share the host’s crawl limit with the other sites associated with it.
- Desktop and mobile platforms are running on the same hosts. Consider having a separate host for this platform as they mean having the same crawl limit.
- There are frequent server errors or requested URLs timeout.
Caveat! Increasing your crawl limit will not affect your ranking position. We like to remind you as early as now that the crawl limit is not a ranking factor. As mentioned by Google, an increased crawl rate will not necessarily lead to better positions in search results.
How Crawl Budget Works
Crawl Budget is Crawl Rate + Crawl Demand.
Many factors drive the crawl budget. It can be a combination of your website’s health, size, and popularity, meaning it is not the same for all websites.
What comprises a crawl budget?
- All of your site’s URLs
- All of your site’s subdomains
- All of the site’s server requests occurrence
- CSS and XHR requests
- Language version pages with the hreflang tag
- All AMP and m-dot pages (this includes your mobile pages in your crawling budget)
Why is the Crawl Budget significant?
The crawl budget is essential to your SEO campaign.
If you run an eCommerce store, there’s a need for you to be mindful about the possibility of managing more than 10,000 plus pages for your online store.
Having a large website entails more prioritisation of your SEO implementations and your crawl budget. You must help search engines understand what to crawl and when to crawl into your website; if you don’t want them to treat your page as spammy or not relevant enough.
However, if you are maintaining a small website, you don’t need to worry that much about this one, as Google and other search engines can efficiently crawl your website most of the time.
Regardless if you have a large, medium, or small-sized website, knowing how to evaluate your website’s crawl budget is a plus and beneficial in ensuring that there are no issues present affecting your ranking, user experience, and traffic.
Why care about the crawl budget for your site?
As someone who has a website, you should care about your SEO performance. So, understanding how search engines work is something you should know.
You want to secure that when you add new content or update existing ones, search engines will pick these up as soon as possible. The sooner pages get indexed, the sooner you will benefit from it.
Googlebot won’t crawl your website efficiently if you’re wasting the crawl budget. Googlebot will spend time on parts of your site that don’t matter which, can cause crucial parts of the site to be left undiscovered. So, if Googlebot doesn’t know about your pages, they won’t crawl and index them, and you will not be able to bring in web visitors because your page doesn’t show up on search results.
How To Check Your Crawl Budget
Google and other search engines can’t do all the work for your website if you want to ensure a successful SEO campaign run.
We have mentioned that these search engines rely on a crawl budget because they lack the luxury of limited resources.
Making it difficult for them to manage millions of websites on their own; thus, assigning these crawl budgets to do the job. Having this said, you must take matters into your hands and evaluate your website.
Here are some of the ways to check your Crawl Budget.
- Use Server Log File Analysis
Performing server log analysis is the best way to start checking your crawl budget. Server log analysis helps you in determining how frequent spiders visit your website.
You will have an idea of which pages these bots have crawled more often than the other pages present on your website. You can manually or automatically perform log file analysis as there are commercial log analysers available that can do this.
Through the server log file report, you will:
- Gather relevant information to the bots activities on your website
- Know how frequent spiders crawl your website
- Have the idea on which pages gets the crawled the most
- Spot the type of errors often encountered by crawlers on your site.
Here are some log analysers available online you may want to try:
1. Sematext Logs
2. SolarWinds Loggly
4. Logentries (now Rapid7 InsightOps)
6. Sumo Logic
7. SolarWinds Log & Event Manager (now Security Event Manager)
8. ManageEngine EventLog Analyzer
- Using Google Search Console
Google Search Console-verified websites can get insights into their crawl budget for Google.
You may follow these steps.
- First, determine the number of pages on your website. Your XML sitemaps can be a good reference as it contains the list of your URLs.
- Go to Google Search Console
- Click on “Settings” then “Crawl Stats.”
- Take note of the indicated average crawled pages per day.
- Then do the math. Divide the number of pages by the “average crawled pages per day” factor.
- If your result shows:
- Higher than 10, meaning you have more pages than what Google can actually crawl each day, you must OPTIMISE your crawl budget.
- Lower than 3, you must keep doing what you are doing to your website.
Understanding Google Search Console
Google Search Console is a tool that helps you check and analyse how your website performs in search results.
This tool is critical if you want to stay competitive and measure how your SEO implementations are performing on the web, as it gives you a record of your site’s traffic and helps fix issues present on your website.
Having the idea of how you perform online – impressions, clicks, and position – gives you more advantages.
This metric is beneficial because you will know what content to produce, which keywords perform best and analyse link reports.
Other than that, Google Search Console enables you to analyse your website’s indexing status. Googlebots troubleshoot errors they have found on your site to ensure your content’s visibility.
These are the common crawling response codes that you can use to identify why some pages give an error.
- robots.txt not available
- Unauthorised (including codes 401 and 407)
- Server error (5XX)
- Other client error (4XX)
- DNS unresponsive
- DNS error
- fetch error
- The Page could not be reached
- Page timeout
- Redirect error
- other error
Factors That Affecting Crawl Budget
These are some of the elements affecting your website’s crawl budget. Take note of these as some might have been troubling you and your website.
Server and Hosting Set-Up
One of the things we raised in this article is associating your website with hosting platforms shared with other websites.
If you entrust your website to hosting applications and servers that are not stable enough, it can crash your crawling activities.
Navigation and Session Identifiers
Faceted navigation or faceted search is an in-page navigation system commonly used for eCommerce stores, listings, and other websites that contains large numbers of results.
Because of the bulk of information these sites carry, faceted search becomes more non-search friendly as it creates multiple versions of the same URLs with duplicative content.
This is where the trouble starts. Search engines may have a hard time crawling on websites having duplicative content or URLs. They will consider these new or updated ones, and they might not index this page correctly.
Faceted navigation gives rise to the following SEO problems:
- Duplicate content happens when multiple versions of your URLs exist under the same page on your website.
- Crawl waste, bots will spend time crawling on duplicate pages instead of focusing on valuable pages.
- Weak link equity means that your internal linking, instead of one page getting optimised through links, gets shared to hundreds of duplicates on your website.
- Crawl trap. Faceted navigation tends to create multiple variations of your core URLs. This activity can trap the spiders crawling these updated or new sets of URLs.
Session identifiers or session IDs, or session tokens are unique tags that a website assigns to a specific user as an activity tracker in a predetermined time or session.
Typically, user authentication and shopping cart activity use session IDs. On the other hand, cookies serve as a method of delivering and storing session tokens. Another thing worth noting is that you can embed session IDs on URLs as a query string.
The problem here is that cookies and URL parameters can prevent search engines from crawling and indexing your website. Search engines ignore cookies, and URL parameters can lead to duplicate content.
To be sure, when you are using cookies to present your site’s content, provide alternative methods for search engines to access your page. Direct links or sitemaps can be helpful in this situation.
On the other hand, canonicalisation can solve your duplicate content issue brought by session IDs using URL parameters. It helps specify which URL version search engines need to index or crawl into.
Another factor affecting your crawl budget is duplicate content. Be careful with the content you are publishing. It might affect how search engines interpret and index your content.
Google will consider your entry as spammy once they detect duplication on your site. Remember to ensure that your content provides value to online users; thus, it is a must to check duplicative content.
One thing that can affect your crawl budget aside from duplicate content is a low-quality or spammy type of content.
Google aims for valuable resources. Ensure that you provide a better user experience through your website’s pages and avoid posting very little pieces of content all the time.
Crawlers will stop indexing and crawling on your website if your website and hosting aren’t fast enough. This is why it matters to have a stable hosting provider.
Pages that take too long to load can disappoint online users and lower your crawl budget.
What Is Crawl Budget Optimisation
Crawl Budget Optimisation is the process of helping and making the most of the short amount of time that search engines spend in finding, crawling, and indexing your site’s critical pages or content.
How to Optimise Crawl Budget
After we have identified those factors that limit our crawl budget, it matters to ensure that we optimise our crawl budget in the best way possible.
Prevent Crawler from Crawling Pages with Low SEO Value
Prioritisation is the key. If you run large websites, we recommend you prioritise pages to crawl. Let’s face it. As owners, you know which of your pages have the potential to rank or the ones with low SEO value.
By using Google Search Console or Analytics, you can filter pages that are top-performing through click rate and impressions. These pages are easier to crawl and index.
Reduce Redirect Chain
You need to watch out for a redirect chain on your website. Part of keeping your website’s health in check is ensuring you don’t have a single redirect chain in your domain.
This is a common occurrence for large websites, as you can’t get rid of encountering 301 & 302 redirects. Spiders will no longer crawl your site if these issues persist on your web pages. Crawling and indexing your site will be too difficult if you don’t solve this right away.
Refresh Stale Content with Good Performance
One of the quickest wins for SEO is refreshing your stale content.
It is time-consuming to produce and publish new content every day that has no value. What you can do is to check old articles and update them.
Relevancy and freshness help search engines crawl your content. Improve your site performance, spike up your ranking position, and have bots to recrawl your site by presenting new content.
Who knows, bots might likely increase your crawl budget each time you make improvements to your website’s content?
Review internal link structure
How you link your links to one another is very important in optimising your crawl budget. Aside from backlinks, pages that don’t have enough internal links will get less attention from Googlebot than pages linked to multiple pages.
You can avoid a very hierarchical link structure. In most cases, this link structure doesn’t get crawled frequently; thus, not helping the page. Make sure every essential page has sufficient internal links.
Remove inactive or broken links
Inactive links or 4xx can be a disadvantage from your end, not only on your crawling budget.
As your website continues to undergo updates, you may have internal links on your site that point to URLs that are inactive, broken, or no longer have value.
Optimising your crawl budget will lead you to better understand your website as it pushes you to audit your content more and see which needs updating and pages you need to let go of.
How to check broken links
To find broken and redirecting links easily, you can use Screaming Frog. Here’s how you do it:
- Open Screaming Frog. Enter your site URL, and click “Start.”
- You can check redirect or broken links status for each link on “Status Code.”
- By seeing the status code from Screaming Frog, you can change it to “no-follow” instead of 301 or 302 redirects.
As an online user, you know how critical page speed is in determining user experience and securing one’s ranking position in search engines.
Improving Crawl can improve your Revenue
You are what bots eat. If your site contains valuable and relevant content, it is easier for spiders to crawl. You know you are on a budget, and it demands you to use it wisely.
So, ensure to get rid of junk pages on your website as there can be a loss of opportunity cost if spiders spend time crawling on these low ranking and non-revenue generating pages.
If you have heard about the SEO funnel principle, you will understand that optimising your crawl optimisation has downstream advantages. Not just on your site’s ranking position but also revenue-wise.
Keywords, long-form content, and reputable links are not the end-all-be-all of your SEO campaign. There are many areas that you have to cover and processes that you need to uncover to ensure the overall health of your website.
Though the concept of crawl budget optimisation can be mind-boggling for some as it can be hard and complicated because it is part of technical SEO, performing site maintenance well can be a good start.
The more you pay attention to how your website looks, the more care you should give to what is happening in the background.
Some neglect the crawl budget when it comes to evaluating their site’s performance, but it will bring you wonders if you know why search engines don’t index and crawl your entries. Why are there pages you need to take down or update the URLs?
Now that you have an idea of what a crawl budget is – how to optimise it, and why you need it for your SEO, are you ready to do some spider hunting?
Kickstart Your Website Growth With Our SEO Services
If you want to begin optimising your website but don’t have the time, Roots Digital can help! Roots Digital is a digital marketing agency that specialises in helping e-commerce and B2B businesses to increase their revenue with digital marketing.
Our team of SEO professionals knows how to create SEO campaigns that grow our clients’ businesses.
Ready to boost your website’s ranking in search results to drive more revenue? Contact us online or speak with a strategist about our SEO services today!
Frequently Asked Questions
What is a crawl budget?
Crawl budget refers to the number of pages that Google crawls on your website or any website on any given day.
This is generally determined by how big your site is, how healthy it is (Google counts internal errors as a point), and how many links lead to your site. Some of these factors are things you can influence.
Is the crawl budget just about pages?
Does Google crawl all websites?
Google doesn’t always crawl every page on your site instantly. In fact, sometimes, it can take days and even weeks.
This happens because Google has a Crawl Budget.
The crawl budget is the number of URLs Googlebot can and wants to crawl on a website.
This is one of the key factors in determining how visible your website is on the SERP or Search Engine Results Page.
If your pages don’t get crawled, they won’t be indexed and displayed in the search results.
What affects the crawl budget?
Different factors affect your crawl budget. Server and hosting setup, navigation and session identifiers, duplicate content, low-quality content, and rendering are some of the factors that influence your crawl budget.
These factors limit search engines from properly crawling your website. You don’t have to stress over this much as you can improve and update your site based on the points we raise in this article.
Can I check my website’s crawl activity?
Yes. It is not a secret. You can track and monitor your website’s crawl activity.
You can review the crawling activities of all your pages in a short time by checking the Crawl Stats Report on the console if you have Google Search Console integrated into your website.
The generated report from the Google Search Console shows you a detailed breakdown of a crawl request. You can have it by:
– response breakdown,
– file type breakdown,
– purpose breakdown and
– Googlebot type breakdown.
The crawling statistics that Google Search Console provides shows the lists of Googlebot and the insights you need to understand better how crawling works for your website. Having these insights allows you to know which areas you can improve on or you have total control of.
How can I improve my crawl budget?
There are many ways you can improve or increase your crawl budget. You may do the following:
– Prevent Crawler from crawling pages with low SEO value,
– Reduce redirect chain,
– Refresh stale content with good performance,
– Review website’s internal link structure,
– Remove inactive content or pages.
Improving your crawl budget means checking your website’s health as well. We raised in this article some simple guides on how you can improve more on building your website through fresh, relevant, and valuable content.
When should I worry about the crawl budget?
You don’t have to worry much if you have valuable and popular pages on your website.
The challenge comes in when you have new pages, lack strong links, and those content that you don’t update anymore, expect that Google didn’t take time to crawl them anymore.
The crawl budget can be a concern for newer sites, especially if these sites hold a lot of pages. Google and other search engines may not likely crawl this because they don’t know if it is worth indexing.
Another thing is that If you maintain larger websites containing millions of pages that you are frequently updating can also be a concern to your crawl budget.
Crawling for these types of websites can be slow due to constant updates that may require bots to recrawl or reindex.
Is crawling a ranking factor?
Google said it best. An increased crawl rate will not necessarily lead to better positions in search results. So,Increasing your crawl limit will not affect your ranking position.
Even if it is not a ranking factor, keep in mind that it is the first step to getting the search engine to understand, index, and know your content to rank it.