I've been reading about canonical tags but can't find a definitive explanation as to whether the file extension should be included in the canonical tag.
I have three files in the root folder and Google console tells me the not all pages have been
indexed. Google says:
Duplicate without user-selected canonical.
So how do I tell Google crawler that index.html
is the master version?
In the examples I've seen, there has been no mention of the filename just folder names.
In my example I a fictitious site below:
https://portfolio-website.example/index.html
Should the canonical in the index.html
header be:
<link rel="canonical" href="https://portfolio-website.example/index.html" />
Is this the pattern to use for every .html
file?
index.html
should never be part of your URLs. It should be omitted from your canonical URL.index.html
is supposed to be a hidden file that powers the request for the directory. Users are never supposed to know that you have it because it is ugly and unnecessary to put in the URL.That means that when you choose the canonical for your home page, it should be:
<link rel="canonical" href="https://portfolio-website.example/">
When you link to your home page, you should also omit the
index.html
. The easiest ways to link to your home page without it are<a href="/">
(root relative link, works from within your site) or<a href="https://portfolio-website.example/">
(absolute link.)index.html
is the only HTML document that should be treated this way. If you have another page (likefoo.html
) the document name and extension would go in the canonical URL (and in links).