Ecommerce

What to include in a Magento Robots.txt - Images used under creative commons licences from Flickr https://www.flickr.com/photos/littlestuffme/8962393633 and https://www.flickr.com/photos/legoalbert/8868875522

Magento is an industry leading e-commerce software platform used by 240,000 businesses worldwide. Its popularity amongst online retailers is built on solid performance and a wide range of features out of the box. The number of features can be further increased by purchasing readily available extensions, or through developing your own, as Magento is an open source platform. Meanwhile, features such as a dynamic sitemap and canonical tags mean that Magento also happens to be great for SEO.

However Magento doesn’t include a robots.txt file with a default installation. A robots file is designed to tell search engines which files to index on your site, and which files to ignore. A robots file can help you to address duplicate content issues through instructing search engines to ignore certain versions of pages and directories.

In this post I will outline the sort of things you should look to include in a Magento robots.txt file.

Robots.txt File Basics

Before we look in more detail at what should go in a Magento robots file, we shall first look at some basic facts about robots files:

  • A robots.txt file is simply a text file containing a series of commands for search engines to interpret. A text file can be created using Notepad or another basic text editor.
  • A robots file should go in the root of your website. This is usually in the same location as your homepage. For example www.yoursiteaddress.co.uk/robots.txt.
  • The most common mistake people make in robots files is to disallow the entire site and prevent their site from being indexed by the search engines. This often happens when a new site is developed and the web developer forgets to update the robots file on launch.

Magento Robots text file location

 

 

Read more on what include in a a general site robots file.

What to Include in Your Magento Robots File

Define Which Search Engines Can Index Your Site

At the very top of your file you need to define which search engines can index your site. These are known as “User-agents”. In the example below I have allowed all search engine to index the site:

# Crawlers Setup
User-agent: *

Include a Reference to Your XML Sitemap

All websites should include an XML sitemap. Settings in Magento allow you to set up a dynamic sitemap to include products and pages as and when they are added. In your robots file you need to include the follow line to instruct search engines where to find your sitemap:

Sitemap:http://yourwebsiteaddress.com/sitemap.xml

How to Exclude Magento Search Filters

Magento sites allow users to search for products by individual product attributes, or by multiple combinations. This in turn can lead to hundreds, or even thousands of URLs that could be added to Google’s index. I recommend that you exclude the search filters in your robots file so that search engines do not waste their time indexing them. Magento search filter to exclude are below:

# Search Filters and product comparison

Disallow: /catalogsearch/
Disallow: /Shopby/
Disallow: /*price=
Disallow: /*sizefilter=
Disallow: /*colour=

The exact filter parameters for your site may vary depending on how your site has been built. Crawl your site using a tool such as Screaming Frog, look at the search filter patterns for your site, and add any additional lines to your robots file.

Magento Directories to Disallow

The basic installation of a Magento site includes numerous directories that offer little value to the search engines. The directories below can be excluded:

# Directories and default URLs

Disallow: /home/
Disallow: /checkout/
Disallow: /contacts/
Disallow: /sendfriend/
Disallow: /downloader/
Disallow: /catalog/product_compare/
Disallow: /pkginfo/
Disallow: /report/
Disallow: /stats/
Disallow: /wishlist/
Disallow:

There are many sites that suggest you disallow:

Disallow: /control/
Disallow: /cgi-bin/
Disallow: /shell/
Disallow: /skin/
Disallow:

Feel free to do this, but it is important to note that to make your site secure from would-be hackers, I recommend that you restrict access to these areas using your website configuration settings in either your .htaccess file (apache servers), your web.config (ASP.net sites), or your server settings. For more information about securing your Magento site, speak to your website developer.

Never disallow the /js/ directory. This contains script files that Google needs to be able to access.

Magento Files to Exclude in Your Robots.txt File

The following Magento files should also be excluded:

# Files

Disallow: /cron.php
Disallow: /cron.sh
Disallow: /install.php
Disallow: /LICENSE.html
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt

Conclusion

It is important that you check your site’s robots file to ensure that it includes all of the above elements. At the very minimum, you need to make sure that it includes a reference to your sitemap, and that it doesn’t simply include the /disallow command, which will exclude all pages from the search engine index.

If you do make changes to your robots file on your Magento site, check that you are not preventing Google from indexing your site. This can be done by using the site command in Google – site:yourwebsiteaddress.co.uk – to see which pages are in the index. You can also check your Google Webmaster Tools account to check for indexing issues.

If you have any questions concerning Magento, ecommerce, or any aspect of technical SEO, feel free to get in touch.

3 responses to “What to Include in a Magento Robots.txt File”

  1. Anson says:

    Great stuff, really. I have put it to my website using this tutorial.

    • Pete Keyworth Pete Keyworth says:

      Hi Anson

      I’m pleased that you found the post useful.

      I’ve looked at your site and some of the characters within your robots file seem to be corrupted. I recommend that you take a look at this. See below:

      User-agent: *
      Disallow: /customer/
      Disallow: /admin/
      Disallow: /checkout/
      Disallow: /downloader/
      Disallow: /errors/
      Disallow: /media/c​atalog/product/cache/
      Disallow: /wishlist​/
      Disallow: /404/
      Disallow: /app/
      Disallow: /cgi-bin/
      Disallow: /report/
      Disallow: /var/
      Disallow: /catalogsearch/

      Cheers

      Pete

Leave a Reply

Your email address will not be published. Required fields are marked *