Duplicate content has long been a problem on the web. Not just a problem for content producers – who create great content only to see places like Mahalo come scrape your content and use it on their site to generate income for themselves.
Sometimes it’s a problem for a normal company. At McClatchy newspapers, they used to publish the same story into multiple sections because it fit in both “local” and “business”. Bingo – duplicate content.
That tag allows a web page to tell the search engine spiders, “I’m not the real page for this content, I’m just a cousin, the real page is located at…”
Using the newspaper as an example, they would designate one of those story versions as the canonical version, and the other page would have a meta tag in the head pointing at it.
As good as that was, it still left a big gap that people asked about the minute the original announcement was made. What about cross-domain tagging.
Some cross-host canonical tagging works. I can say that www.1918.com/foo.html is the canonical page even though backup.1918.com/foo.html has the same exact article. Intra-domain canonicalization works now. What doesn’t work is true cross-domain rel= canonical tagging.
That may be changing soon. During his April 13th Webmaster Help video, Matt Cutts says “…we are very far along the path of being able to look at rel=canonical and apply it across domains.”
If you syndicate content, make sure you include the canonical tag pointing back at the original source.
Photo by: Sam