robots.txt files

Why Robots.txt is Such a Vital Element

Posted by Mike

Did you know there is a file that can single-handedly prevent your entire site from ever ranking on Google? While your first reaction may be to avoid it like the plague, that same file can also be used to help your SEO campaigns. That file is called robots.txt.

What is Robots.txt?

Robots.txt is the first file search engine bots look at when they crawl a website. It tells the bots how to crawl a site through coded instructions called The Robot Exclusion Protocol. Robots.txt is typically used to prevent bots from crawling parts of a site and including the content in search results.

While many people would like every page of their site to be prominently displayed on the first page of Google, there are several instances in which you need to hide pages from search engines.

For example:
• Parts of a site may not be fully developed
• Email and/or ad campaigns may use copies of pages that would be considered duplicate content
• A website may share a piece of content that was published by another site

Using Robots.txt

Websites are not required to have a robots.txt file, and many site owners and webmasters never have the need to create one. In this case, bots are allowed to crawl the entire site. If your site needs robots.txt, whoever creates the file MUST know what they are doing. You never want to risk blocking the wrong part or all of your site from search engines. Fortunately, the code is pretty straight forward.

Suppose http://www.examplewebsite.com/ wants to block all bots from the contact page. Let’s call this contact.php. The code would be:

 User-agent: *
Disallow: /contact.php

It can also allow certain bots to crawl a site while blocking others. If that same site only wanted to block Bing from the contact page, they would enter:

User-agent: bingbot
Disallow: /contact.php 

Using Meta Robots

If your needs are a little more complex, you can use meta robots tags. These are written in the <head> of individual pages and can include more detailed instructions. For example, suppose http://www.examplewebsite.com/ still doesn’t want their contact page to be included in search results. However, they have some cool features on the contact page that have generated some valuable backlinks. In order to keep the SEO value of these backlinks while blocking the page from search results, they would add the following meta robot tag.

<head>
<title>Contact US</title>
<meta name =”description” content= “Interested in our services? Contact us today!” />
<meta name=”Robots” Content=”NOINDEX, FOLLOW”>
</head>

You can use different combinations of “NOINDEX”, “INDEX”, “NOFOLLOW”, and “FOLLOW” to tell the bots what to do.

Be Careful What You Block

In order to better understand your site, Google wants to have access to everything it needs. And if you want good rankings, you want to please Google. This means robots.txt should only include the files that really need to be blocked. The robots.txt of many WordPress sites used to automatically block several types of files and plugins until Yoast founder and CEO, Joost de Valk, asked them to remove this feature.

It’s also important to note that all robots.txt files are public. In fact, you can reach any site’s robots.txt just by adding /robots.txt to the end of their domain. Because of this, you need to make sure any files containing sensitive information have security features in place to protect the data.

A Powerful but Delicate Tool

Robots.txt and meta robots tags can be very useful, but you MUST be sure to use them properly or risk losing rankings. If you have any questions about robots.txt or want to add it to your site, be sure to talk to your webmaster before you write any code.

Leave a Comment

want to read more? here are some Similar posts