Canonical tags: the complete guide

Canonical tags are an essential component of SEO. This following guide aims to help you use canonical tags in the right way for your website so that the most beneficial pages rank highest in search results and get found by your customers.

What are canonical tags?

A canonical tag is an HTML tag which signals to search engines which version of a URL is to be considered the primary (canonical) version.

It suggests to Google which version of a URL you’d like to rank, consolidates ranking signals into that preferred version and helps prevent problems that can arise from having duplicate versions of a page.

Canonical tags are particularly important when it comes to ecommerce SEO, but are still relevant on other types of websites.

It’s worth noting that a canonical tag is a hint, not a directive: so while Google tends to respect them, in some cases it might not. If it’s absolutely essential that certain URLs aren’t crawled and indexed, it would serve you better to use another solution such as disallowing in robots.txt.

What does a canonical tag look like in HTML?

A canonical tag is found in the <head> of HTML and is structured like the following:

<link rel=“canonical” href=“https://www.hallaminternet.com/” />

Using canonical tags to resolve duplicate content issues

When most search engine bots come across two or more pages with very similar content they often choose just one of the pages to index, effectively ignoring the others.

Whichever duplicate page URL they decide to choose could be based on a number of factors including whichever one was first crawled, which one has the most internal links or which one has the most external links – ultimately meaning that if duplicate content is not resolved properly, you could end up ranking a less preferable version of your page.

There will often be situations where you have a number of pages vital to the infrastructure of your website but contain either similar or identical content. Adding a rel=canonical tag can tell crawlers which is your preferred page and avoid the issues, shown in the examples below.

Product page duplication

Let’s look at an example of ecommerce product page duplication below:

duplicate-web-pages

All of these product pages for the red toy truck have exactly the same content besides only minor variations in the top breadcrumb links.

The way this website has been structured means that there are three pages for one product: canonical tags are required to indicate which one is the preferred version to display in search.

canonical-example

In this case, it’s wise to choose the permanent product page as the preferred version, as it isn’t in the “sale” category (the sale is likely to end one day) and isn’t in the “items under £10” category (the price may rise one day to over £10).

Using canonical tags will signal to search engines that the primary product page is located at [http://www.example.com/toys/trucks/red] and that the other two URLs are duplicate versions of this canonical page.

It’s not just product pages on ecommerce sites that can benefit from use of a canonical tag – let’s look at some more examples.

Trailing slash/non-trailing slash versions of URLs

Check if your URLs are accessible both with and without a trailing slash. For example:

https://www.example.com/
https://www.example.com

URLs with and without a trailing slash are treated as two separate URLs, so this situation requires resolving.

A canonical tag is one solution here. If you want the trailing slash version to be the canonical version then add a canonical tag onto https://www.example.com that points to https://www.example.com/.

This should resolve any issues but a 301 redirect to the trailing slash version would be the best option because it simply cannot be ignored by Google.

URL parameters

Page URLs can have added information at the end of them in the form of parameters. These are always shown after a question mark in the URL:

Non Parameter URL – https://www.example.com/blog
Parameter URL resulting in completely different page content – https://www.example.com/blog?page=2

URL parameters can be used as part of tracking, in which case they won’t modify the contents of the page. Or, they can be used to modify the content of the page: they could present the same content in a different order, show completely different page content or filter out certain things.

As URL parameters often present the same content in a different way they can lead to duplicate content issues – using canonical tags is one potential solution to resolving these issues.

URL parameters and product sorting

URL parameters are commonly seen across ecommerce websites when filters or sorting is used. For example, on a category page we might want to sort the displayed products in order of ascending price, in which case we might see a URL similar to:

https://www.example.com/mens-shirts?price=ascending

As this has simply reordered the same content, using a canonical tag on this parameter URL pointing to https://www.example.com/mens-shirts would be one way of resolving potential duplicate content issues.

URL parameters and product filtering

Filtering an ecommerce category page by colour might introduce the following new URL: https://www.example.com/mens-shirts?colour=red

Note that this isn’t always an issue – it might be useful to have one level of filtering options on a category page indexable for search opportunities. In this case, that would be a ‘mens’ red shirts’ page.

This kind of parameter starts to become problematic if we already have a static, indexable category page for mens’ red shirts, i.e example.com/mens-shirts/red, because it introduces a duplicate page listing mens’ red shirts.

In this case, using a canonical from /mens-shirts?colour=red to /mens-shirts will help prevent the parameter URL from competing with the existing red colour subcategory page.

Tracking parameters

Tracking parameters might look something like this:

http://www.example.com/contact-us?trackingID=123456

Or contain a UTM code to show where users on your site have come from, for example: http://www.example.com/contact-us/?utm_source=newsletter

In cases where tracking parameters are used, a canonical tag would be a good solution to avoid duplicate content issues. In the above examples, the URLs containing the parameter would both canonicalise to /contact-us.

Bear in mind that Google still needs to crawl URLs that contain a canonical tag. If you’re concerned about crawl budget, a better solution may be to block your required query parameters in robots.txt.

How to implement canonical tags

When implemented in HTML, the canonical tag needs to appear in the <head> of your page’s HTML.

Once you understand which URL is the canonical version, the duplicate versions of that URL would have the canonical tag applied pointing to the canonical version.

For example, if https://www.example.com/mens-shirts?price=ascending is the duplicate URL and https://www.example.com/mens-shirts is the canonical URL, on https://www.example.com/mens-shirts?price=ascending we would need to see in the <head> the following canonical tag:

How to implement canonical tags on PDFs

On non-HTML documents such as PDFs, there’s of course no way to add the canonical tag to the <head>. There may be occasions when you’d want to implement a canonical though. For example, if you have PDF versions of content which is a duplicate of an HTML page.

In these cases, it’s possible to add the canonical tag via the HTTP header.

For example :

HTTP/1.1 200 OK
Content-Type: application/pdf
Link: <https://www.example.com/preferred-url>; rel=canonicals

Common canonical tag mistakes

Non-200 status URLs in canonical tags

URLs referenced in your canonical tags should only point to indexable, 200 status URLs, so make sure there’s no 301 redirects or 404 URLs within your canonical tags. This sends conflicting signals to Google, as we’re saying that the preferred version of the page which we want indexed, actually redirects somewhere else or is broken!

Non-200 URLs could manifest themselves in several ways, like the http and trailing slash examples below.

Canonical tag pointing to http

Occasionally you might see canonical tags on https sites that are correctly self-referencing, but pointing to the http version of the URL.

This is a mistake, as that will likely result in 301 redirect to the https URL. URLs within canonical tags should always deliver a 200 status.

Canonical tag pointing to incorrect trailing slash version

As outlined above, canonical tags can be used to resolve duplication caused by the same URL being accessible with and without a trailing slash.

But even if both versions aren’t accessible, make sure to check that canonical tags are pointing to your preferred trailing slash URL, otherwise it’ll lead to a non-200 status.

Relative canonical tags

Absolute URLs containing the entire address should be used in the canonical tag rather than relative URLs. For example,

<link rel=“canonical” href=“https://www.example.com/mens-shirts/” />

rather than

<link rel=“canonical” href=“/mens-shirts/” />

Non-canonical URLs in sitemaps

The only URLs in your xml sitemaps should be your canonical URLs, and these should be indexable, 200 status URLs.

The canonical tag indicates the preferred version of a page, and if non-canonical URLs are in your sitemap, it introduces some confusion over what the preferred version of the page actually is.

Keep things consistent everywhere you can: xml sitemaps, internal linking structure, canonical tags and any other links to URLs such as hreflang tags should all reference the same, canonical version of a URL. This keeps things as clear as possible for Google regarding crawling and indexing.

Canonical tags and pagination

Pagination needs to be properly handled to minimise SEO issues. In a numbered sequence of pages, each page in the sequence should have a self-referencing canonical tag.

A common mistake in paginated sequences is adding a canonical tag to each numbered page in the sequence pointing to the first page in the sequence. For example, https://www.example.com/blog/page/2/ having a canonical tag pointing to https://www.example.com/blog/.

Google will be less likely to crawl URLs that canonicalise to a different URL. If those paginated URLs contain any valuable content (i.e older blog articles), we want to ensure Google is crawling and indexing these, this can be achieved by ensuring there’s a self-referencing canonical on all paginated pages.

Final thoughts

Remember these key points when it comes to implementing canonical tags:

Canonical tags tell Google which version of a URL is the preferred version you want to see in search results. This helps avoid duplicate content issues.
Canonical tags are often required on ecommerce sites when filtering of category or product pages is used – otherwise you can end up with multiple versions of the same page.
Ensure that URLs in canonical tags are always absolute, 200 status indexable URLs and that there’s consistency between the URLs linked across your site.
Finally, remember that they’re a hint, not a directive. Other solutions such as a noindex tag or a robots.txt disallow would be more appropriate if it’s essential that content is not displayed in search.

For any further technical SEO advice or guidance, get in touch with us or connect with me on LinkedIn.

Canonical tags: the complete guide

What are canonical tags?

What does a canonical tag look like in HTML?