Robots Meta Tag and Its Role in SEO

How to
Robots Meta Tag and Its Role in SEO

SEOs are powered with a wide range of means to instruct search robots to crawl and index the website in a way they want. Start with robots.txt file which makes this or that part of the website visible for robots, and keep on with robots meta tag and X-Robots-Tag HTTP header which were coined for firmer rules and directives – to guide the robot through the page and tell how to treat it. It’s a mighty tool that requires a skilful approach.

  • 1. What Exactly Does Meta Robots Do?
  • 2. What Is the Purpose of Robots Meta Tag in SEO?
  • 3. Now Let’s Chat about ‘Content’ Instructions
  • 4. How to Check Meta Robots in Netpeak Spider
  • Wrap It Up

1. What Exactly Does Meta Robots Do?

Robots meta tag provides search engines instructions for how to crawl or index website pages in a way you’d like them to do it. Roughly speaking, it’s not a strict instruction but a ‘recommended’ directive which is likely to be ignored by crawlers. However, Google stressed that they are firmer than rules set in robots.txt file which aren't a bloking mechanism at their nature. Robots meta tag is commonly used to protect certain pages from indexing, which gives you the power to choose what pages to show and what to hide.

Learn more about directives:

1. In robots.txt: 'What Is Robots.txt, and How to Create It'.

2. In X-Robots-Tag: 'X-Robots-Tag in SEO Optimization'.

Robots meta tag is placed into the <head> section of a web page and looks like this:

<!DOCTYPE html> <html><head> <meta name="robots" content="noindex" /> (…) </head> <body>(…)</body> </html>

The code consists of two parts: name and content. The first ‘name’ attribute goes for a crawler which performs your instructions. The ‘name’ value is also known as User Agent since crawlers are identified with their User Agent to request a page. For example, Google has its User Agent called Googlebot. Imagine that our task is to prevent this robot from indexing a page, so we change the tag as shown below:

<meta name="googlebot" content="noindex" />

Apart from Googlebot, there’s a slew of other robots, and the names vary from their scope of activities. For instance, Googlebot-image is responsible for image search, so when you feel like not to appear there, you mention this crawler in the ‘name’ value:

<meta name="googlebot-image" content="noindex" />

When you want to apply a directive to all robots, simply use ‘robots’:

<meta name="robots" content="noindex"/>

2. What Is the Purpose of Robots Meta Tag in SEO?

When you use meta robots, it allows controlling indexing, as it provides you with the ability to manage search crawlers’ behaviour. It happens that website owners don’t want certain pages to be indexed, e.g. sitemap pages, or the ones containing irrelevant content.

You’d rather put such pages out of sight:

  • pages with thin content
  • pages with confidential information
  • admin or thank-you pages
  • drafts or incomplete pages
  • internal search results
  • duplicate content (it’s better to use canonical in this case)
  • PPC landing pages
  • pages about upcoming campaigns (Black Friday, contests, etc.)

It’s evident that when your website grows, you should manage it by carefully minding indexing and crawlability. To do it efficiently, balance directives in meta robots tag, robots.txt file, X-Robots-Tag, and sitemap.

3. Now Let’s Chat about the ‘Content’ Instructions

If the first ‘name’ attribute tells what robot should follow the instruction, the ‘content’ attribute tells how it should do it. The value may be one of the following: noindex, index, follow, nofollow, noimageindex, none, noarchive, nocache, nosnippet, notavailable_after. Most SEOs and webmasters rarely go beyond nofollow, noindex instructions, but it’s a good thing to know that you have a bundle of them to choose.

There are four commonly used instructions:

  • noindex: tells search robots not to index the page that prevents it from showing in the SERP.
    <meta name="robots" content="noindex" />
  • index: tells robots to index the page. Though there’s no need to set it since it’s default.
    <meta name="robots" content="index" />
  • follow: tells robots to crawl all the links on the page.
    <meta name="robots" content="follow" />
  • nofollow: stops robots from crawling the links on the page.
    <meta name="robots" content="nofollow" />

The additional instructions are:

  • noimageindex: used in case the images on the page should not be indexed.
    <meta name="robots" content="noimageindex" />
  • none: combination of ‘noindex’+‘nofollow’.
    <meta name="robots" content="none" />
  • noarchive: prevents the page from being cached in Google.
    <meta name="robots" content="noarchive" />
  • nocache: it's identical to ‘noarchive’ but only used by Microsoft browsers.
  • nosnippet: used in case the page should not be shown in the snippet (e.g. meta description). It also works as noarchive.
  • unavailable_after: closes a page for indexing after a particular date.
    <meta name="robots" content="unavailable_after: Monday, 20-Jan-20 17:22:32 GTM" />
  • notranslate: stands for ‘don’t offer translation of this page in search results.’
    <meta name="robots" content="notranslate" />
  • max-snippet: maximum number of characters Google can show in snippets. Note that this doesn’t affect image and video previews.
    <meta name="robots" content="max-snippet:260" />
  • max-image-preview: sets up the maximum size of an image preview for this page in search results. There are three possible directives: none, standard, large.
  • max-video-preview: sets up a maximum number of seconds for a video snippet.

Combinations of several values may be used to achieve one goal. For example, ‘noindex, nofollow’, and ‘none’ keep a lid on indexing and make it impossible to follow the links from it.

Don’t use the ones that contradict each other, e.g., 'noindex, index', or the ones that reflect each other, e.g., ‘noindex, noarchive’. Otherwise, Google will follow the more restrictive instructions.

4. How to Check Meta Robots in Netpeak Spider

To check all meta robots settings on the website, go to Netpeak Spider.

  1. Enter the domain URL into the ‘Initial URL’ field. Make sure that you ticked ‘Crawl all subdomains’ in the ‘General’ settings tab.

    Enter the domain URL into the ‘Initial URL’ bar

    Make sure that you ticked ‘Crawl all subdomains’ in the ‘General’ settings tab

  2. Go to a sidebar, and select ‘Meta Robots’ parameter in the ‘Crawling and indexing’ head group. Hit the ‘Start’ button to start crawling.

    Go to a sidebar, and select ‘Meta Robots’ parameter in the ‘Crawling and indexing’ head group

  3. When the crawling is completed, approach the results. You can filter the table by issues: ‘Blocked by Meta Robots’, and ‘Nofollowed by Meta Robots’, and analyze the results separately.
Results filtered by the ‘Blocked by Meta Robots’ issue in Netpeak Spider
Results filtered by the ‘Nofollowed by Meta Robots’ issue in Netpeak Spider

Why does Spider consider ‘nofollow’ and ‘none’ attribute as a potential threat for your website? Sometimes, webmasters close pages for crawling and indexing by mistake or forget to open later when all optimization works are done. Anyway, bear in mind that only insufficient and unimportant pages for indexing such as shopping cart, login, sign up pages, etc. (indeed, they are important for users), should be kept covered from indexing. For other cases, there are other means to tune your website, take canonicals and redirects.

You can check meta robots settings and work with other basic features even in the free version of Netpeak Spider crawler that is not limited by the term of use and the number of analyzed URLs.

To get access to free Netpeak Spider, you just need to sign up, download, and launch the program 😉

Sign Up and Download Freemium Version of Netpeak Spider

P.S. Right after signup, you'll also have the opportunity to try all paid functionality and then compare all our plans and pick the most suitable for you.

Wrap It Up

Meta robots tag is worth using if you want to improve your on-site SEO since you are powered to manage the pages that you don't want to get indexed:

  • pages with thin content
  • admin, sign up, shopping cart pages
  • pages with confidential information
  • drafts or incomplete pages

Remember that sly crawlers may ignore all your instructions and directives and do whatever they want. Don’t lose sight of the fact that it’s not a 100% guaranteed security mechanism. So if you want to be sure that all your private data is hidden, opt for a more secure approach.