Duplicate content is probably one of the most destructive issues for your SEO.
It means that different pages, with different URLs, have the same content, which search engines hate.
Duplicate content is not always due to an external domain that replicates your content. It can be internal, within the same website.
Fortunately, there are ways to prevent that.
What are canonical URLs
Canonical URLs are not canonical meta tags. Those tags are an excellent means to add canonical, but it’s a means, not the state of your URLs.
The most recommended way to add canonical URLs is the following in the
<head> section of your website:
<link rel="canonical" href="https://www.mysupersite.com/" />
If your page is not an HTML document, then it’s possible to use an HTTP header, but we will not see this part.
By doing so, you are telling Google and other search engines that the content of your page was originally published at this specific URL.
The idea is to add the tag on each page with a unique value. Please note that you have to be careful with
GET and query parameters as it must be unique.
3 common causes of internal duplicates
A common mistake is to use a global variable or any function that only retrieves the current URL without killing parameters like the following:
<link rel="canonical" href="https://www.mysupersite.com/?fail=supercalifragilisticexpialidocious" />
It leads to hazardous situations. As a result, I strongly recommend you kill all parameters for that tag value. Most of the time, the language has built-in functions to do that. For example, in PHP, you might use :
$canonical = strtok($url, '?');
Another common mistake that happens is the HTTPS vs. HTTP issue. If you don’t force HTTPS with a redirect rule and don’t have a canonical tag or your canonical tag does not make the difference, you’ll get duplicate contents.
Last but not least, be careful with the WWW vs. the non-WWW of your domain. You have to choose one and redirect the other. WWW is a subdomain, even if it does not look like it.
Near duplicate content
Plagiarism is a concern. Many websites publish slightly modified versions of already existing content.
Of course, it’s a terrible practice, but they’re doing that on purpose. While it’s relatively easy to detect plagiarism (e.g., with software), near-duplicate content is harder to find.
It can be the same content in a different order or only a few words different. There are many ways, and if you don’t pay attention to it or don’t have any protection (like a canonical tag), the cheater can even get better SEO than yours if he has better signals.
Duplicate is not evil in all cases
It’s not rare to cross-post your content on cool platforms such as dev.to, and in this case, the platform often allows you to customize the canonical URL to point to your website where the content has been published originally.
It’s not bad for your SEO. It’s quite the contrary. Do not hesitate to use this feature and check the page source. You’ll see a link to your website.
But a canonical tag is not the ultimate shield
You can use the Google Search Console to monitor crawl errors and other wrong signals.
You can also use an accurate solution such as botify to detect internal duplicate content.
In addition to those services, there are various ways to reduce duplicate content on your websites, such as using the noindex meta or 301 redirects.
Trailing slashes or not?
It does not matter as long as it’s the same pattern everywhere. Ensure that if you use trailing slashes, there are 301 redirects for URLs with no trailing slashes.
In this case, make sure that all canonical URLs have a trailing slash too.
I don’t see any reason why you would not have canonical URLs. You need them.
I hope this short post will help you avoid harmful mistakes.