SEO is important for every website, whether it’s an online store, social media hub, or a place to do business. As websites get created and landing pages fill with content, some of that content can get duplicated along the way, confusing search engines as they work to crawl and index pages for better search rankings.
But what happens when those duplications happen, and what does that mean for your website as a whole? In this article, we’ll go over the fix for these duplications, a process known as canonicalization, with tips on what you can do to ensure your website maintains a positive user and search engine experience.
Put simply, canonicalization is the process of declaring that one page or URL is the “primary” version of multiple. For instance, let’s say you have an eCommerce website dedicated to marketing your fashion brand. You’ve created pages for all the colors of a top-selling dress, but each of those pages contains roughly the same content and/or product description. The only difference is the color you’re presenting.
While this might seem like the logical thing to do, try to imagine how many pages your website could have if every item has two, three, or ten separate pages for colors. Now try to imagine a search crawler having to determine which of those pages is the most important for indexing. Sounds a bit much, right? Well, it feels the same for search crawlers, meaning they can get confused as they work to figure out where to focus their attention. To them, it looks as if you’re padding your site with duplicate content. As a result, they may crawl it less, missing some of your more unique content, ranking your content lower, or choosing the wrong “original” URL for that content altogether. In short, it’s an SEO nightmare.
Canonicalization is important because it provides an SEO-friendly answer to duplicate content. It tells Google and other search engines that while you have multiple pages for similar content, you’d like them to focus on one primary page or URL for crawling and indexing purposes. In other words, it says “These are all important pages, but this one is the most important.”
Going back to the example from before, let’s pretend a search crawler comes across the main category page for store’s your top-selling dress, plus variations of that dress’s different color and size options.
Likewise, let’s pretend this is the URL of that dress’s main category page:
Now, let’s filter for the color green, keeping note of the parameter added:
If we filter for a size 10, we can see another parameter added as well:
Likewise, if a product category has multiple pages, it’s ideal to have the canonical be the main category page to prevent duplicate content. That URL could look something like this:
To a human, these URLs all represent a single page. To a search crawler, however, each of these URLs represents a unique “page.” Even in this limited example, it’s easy to understand how a search crawler could get confused, decide to stop crawling, or pick the wrong URL as the “primary.” After all, the content is only marginally different. When it comes to specific content, like tracking metrics for a single product or topic, this makes consolidating those metrics especially difficult.
This same kind of duplication happens in other URL types, from search parameters and session IDs to www variants, https variants, and more. Should the search crawler pick the wrong “original” URL, It’s these exact kinds of scenarios where you want to employ the use of canonicalization, aka a canonical tag.
In a nutshell, canonical tags are the snippets of HTML code that define the main URL between duplicate, near-duplicate, or similar web pages. They’re the visualization of the process we’ve discussed above and look like this: rel=”canonical”
Placed within the header of a page, they use simple and consistent syntax, making them an easy-to-use solution to problems associated with duplicate content. As an added bonus, they work especially well for syndicated content across multiple domains, as they help to consolidate page ranking to your preferred URL. This means that similar or duplicate content won’t have to compete with traffic or ranking in search engines.
As the most obvious way to implement canonical tags, HTML also provides the simplest. All you need to do is add the following code to the <head> section of any duplicate page. Here’s the code factored in for our fashion store example: <link rel=“canonical” href=“https://www.yourfashionstore.com/canonical-page/” />
Webpages allow you to set canonicals in the HTTP headers. At the same time, documents like PDFs don’t contain a page <head> section, so you’ll need to also use HTTP headers to implement those canonicals as well.
Google has made it clear that when it comes to sitemaps, only canonical URLs should be listed. In other words, because sitemaps are a useful way to tell Google what pages you deem to be the most important on your site, it’s a simple way of defining canonicals for larger websites.
Internal Links: Internal links also play a role in canonicalization, acting as signals when you link from one page of your site to another. Likewise, Google has a preference for HTTPS URLs over HTTP, preferring prettier URLs as well.
As with any website and SEO strategy, there are some ground rules when it comes to applying canonical tags to your pages. These are as follows:
While they might seem complicated at first, canonical tags are a valuable and easy-to-implement part of SEO optimization for your website. That being said, should you have any further questions about how they work or how to use them, we’d love to help! Our web development experts work with a variety of clients daily, bringing search-informed data structure and intelligent UX to websites across multiple industries.
Contact VELOX Media to learn more about how we can help improve website performance for your business today.