Not surprisingly, website crawl errors such as Soft 404s often go unnoticed by many marketers in favour of less technological aspects of SEO. We often discuss topics such as the creation of quality content, and the importance of building authority in your website’s domain, while it’s all too easy for marketers to throw technical SEO issues back over the fence and onto the desk of their web developer.
However, whilst many web developers know exactly how to build an aesthetically pleasing website, they are often oblivious to how the website is served to web crawlers such as Googlebot (Google’s Web Crawler). This article will focus on one particular category of crawl error, one that, if left unresolved, can hugely reduce the amount of pages search engines such as Google crawl and index in their search results: ‘Soft 404’ Errors.
What’s a Soft 404 Error?
You’ve no doubt previously encountered a 404 “page not found” error message when trying to visit a certain page of a website. Whenever the 404 or Not Found error message is displayed on a page, the server should also return a HTTP 404 standard response code. The 404 HTTP response code indicates that a website’s server could not find the page (URL) that was requested by the user, which informs both browsers and search engines that the page doesn’t exist. As a result, the content of the page (if any) won’t be crawled or indexed by search engines.
For example, let’s imagine a user attempts to visit a URL on our site and they are served this message:
In the example above, our server is responding to a request for a page that doesn’t exist by displaying a 404 page. On other websites, this is often a standard “File Not Found” message. However, we have developed a custom page designed to provide the user with additional options with the aim of keeping them on our website.
What most people don’t understand is that the content of the page – the ‘page not found’ message – is entirely unrelated to the HTTP response returned by the server. Just because a page displays a 404 File Not Found message, it does not mean that this page is automatically defined as a 404 page.
In Google’s own words: “This is like a giraffe wearing a name tag that says ‘dog’. Just because the name tag says it’s a dog, doesn’t mean it’s actually a dog. Similarly, just because a page says 404, doesn’t mean it’s returning a 404 status code.”
A ‘Soft 404’ error occurs when a non-existent page (a page that has been deleted/removed) displays a ‘page not found’ message to anyone trying to access it, but fails to return a HTTP 404 status code. They can also occur when the non-existent page redirects users to an irrelevant page, such as the homepage, instead of returning a HTTP 404 status code. The important thing to remember here is that the content of a web page is entirely unrelated to the HTTP response returned by the server.
The Problem with Soft 404 Errors
If your website returns a HTTP status code other than 404 (or 410) for a non-existent page, it can negatively impact the the website’s performance in organic search. Firstly, by failing to serve a 404 status code, your website is telling search engines that there’s a real page at the URL they’re attempting to access. As a result, the URL you’ve deleted (with no content) will be crawled and indexed, thus wasting valuable crawl budget.
Crawl budget is the concept that Google only allocates a certain period of time to crawling a website before it stops the process and moves on to a different site. Google doesn’t want to waste endless time crawling content on the same website, so it makes sense for them to assign a time limit to their web crawls before moving on to another website.
Sticking with the idea of crawl budgets, if a website has a high proportion of Soft 404 errors, then those pages will be crawled. The process of crawling these non-existent pages will invariably take up needless amounts of the crawl budget assigned to the site. Because of the time Googlebot spends crawling Soft 404s, your unique URLs may therefore not be discovered as quickly, or crawled as frequently – thus reducing the visibility of the important content on your site. It should therefore come as no surprise that when Soft 404 errors are resolved, the performance of a website in organic search results tends to improve.
To explain how you’d assess the extent of a Soft 404 issue, let’s take a look at an example of a website that is displaying a number of Soft 404 errors in Google Webmaster Tools. In the example below, we see more than 439 Soft 404 errors being reported for the website in question. This may well set alarm bells ringing, but we firstly need to place that figure in the right context.
To do this, you’ll want to check how many pages the website actually has that you want Google to crawl and index. For this task, we’d take a look at the XML sitemap for the website in question – which is a key indicator of how many pages a website has.
Looking at the data above, we can see that this website has around 4,200 pages, and the 439 Soft 404 errors now start to seem a little less ominous. Still, at over 10% of the site’s total pages, the 439 Soft 404 errors will be wasting a considerable chunk of the crawl budget assigned to this website. In this case, Google will be spending too much time crawling URLs that simply don’t exist.
How Do I Resolve These Issues?
Google only lets you export a maximum of 1000 URLs in Webmaster Tools. In the example above, there are under 1,000 errors being reported, so these can be downloaded directly via Google Webmaster Tools. Once you’ve exported the list of URLs, you’ll need to assess why those pages are being reported as Soft 404s. Google provide somewhat limited information on the URLs they highlight as “Soft 404s”, as you can see on the example below:
In most cases, you will find that a website will be serving a 200 (OK) status code on pages that return a “page not found” message. Therefore, the first thing you’d need to do would be to run a selection of the Soft 404 error pages through a HTTP status code checker such as httpstatus.io, to assess which status codes those pages are returning.
Let’s say the example domain below was displaying a 404 page to the user trying to access it, but when we checked the response code using a HTTP status code checker, it returned a HTTP 200 response. This is a prime example of a Soft 404 error, as the HTTP response code is indicating to search engine robots that the page exists and should be crawled. However, there is no content on the page that’s returned by the server.
The other issue you might encounter when diagnosing the root cause of Soft 404 errors are inappropriate 301/302 redirects. Some webmasters choose to redirect all deleted pages to the website’s homepage instead of serving a 404 error, which is not at all appropriate and will confuse and annoy search engine robots. The key thing to remember here is that deleted pages or out of stock products should only be redirected to a direct replacement – if a direct replacement doesn’t exist then you should serve a custom 404 error page to display alternative options or products to the user.
I have highlighted an example of inappropriate redirects triggering Soft 404 Errors below. In this case, the webmaster is using 302 redirects to redirect anyone trying to access a page that’s been deleted, and redirecting those users to a custom 404 page – one which doesn’t actually serve a HTTP 404 status code. This will hugely impact how search engines crawl the website in question, as search engines are being instructed to look elsewhere for pages that have actually been deleted, via a 302 redirect. If a search engine robot follows those instructions, they will eventually be served a HTTP 200 (OK) status code for a page that displays a 404 error message, which is a a whole ‘nother level of bad practice.
You should never use redirects to serve a 404 error page. Instead, serve a HTTP 404 response code when any pages you remove or delete from your website are requested. This will prevent your website triggering a huge number of Soft 404 Errors, and will ensure search engines only crawl and index the pages you want to rank.
Will Solving Soft 404 Errors Increase Traffic to My Website?
The reason I always take note of soft 404s whenever I encounter them on a clients website is down to the results of a technical SEO project we worked on back in 2013 for an ecommerce client. Having noticed the client in question had an extremely high proportion of soft 404 errors compared to the total number of pages on the site, I discovered that their website was serving 404 messages without returning HTTP 404 status codes for any of their deleted products, of which there were thousands.
Once we’d diagnosed the issue, we liaised with the clients developer to ensure their web server returned HTTP 404 status codes alongside the ‘page not found’ messages for any products they’d removed from their website. The developer kindly implemented the fix as we suggested, and two days later, we noticed organic traffic increased dramatically. It rose from an average of 1,400 sessions per day to an average of 2,600 per day.
The story doesn’t end there folks. It turns out this client was using a custom website platform used by many other online retailers – meaning that other websites built by the developer were running on the same platform. Thus, when the developer started serving HTTP 404 status codes for any deleted pages on their platform, other businesses using that platform started reporting a sharp rise in their organic traffic. I can only assume that the web developer took all the credit for this, despite the month long battle we had convincing them that Soft 404s were worth resolving in the first place!
Soft 404s: The Importance of Technical SEO
Technical SEO is something many marketers are only vaguely familiar with. Indeed, even for SEO practitioners, it’s often an area that tends to fall into the hands of web developers, which can lead to huge missed opportunities in terms of improving organic search visibility. The technological functions of a website are what I’d consider the building blocks of SEO, and, as we’ve seen in the example above, are especially important to address for enterprise level ecommerce websites.
- Whenever the 404 (Not Found) error message is displayed on a page, the server should return a HTTP 404 standard response code
- The content of the page (the ‘page not found’ message) is entirely unrelated to the HTTP response returned by the server
- A ‘Soft 404′ error occurs when a non-existent page (a page that has been deleted/removed) either displays a ‘page not found’ message to anyone trying to access it, but doesn’t return a HTTP 404 status code OR the deleted page redirects users to an irrelevant page (such as the website’s homepage)
- The number of Soft 404s reported needs to be compared with the total number of indexable pages on a site – if this ratio of Soft 404s/indexable pages is high, it can negatively impact a website’s performance in organic search by wasting valuable crawl budget
- Resolving Soft 404 issues can dramatically improve crawl efficiency and ensure search engines only spend time crawling the pages you want them to