Duplicate Content SEO: How to Find It, Fix It, and Stop Losing Rankings

Duplicate content is any substantial block of content that appears at more than one URL — either within your own site or across the web. Google doesn't penalise duplicate content directly, but it does have to choose one canonical version to index and rank. If it picks the wrong one, or splits its signals across multiple versions, your rankings suffer.

The most common sources of duplicate content

HTTP vs. HTTPS versions both accessible (e.g. http://example.com and https://example.com returning the same page)
WWW vs. non-WWW variants not redirected to a single canonical version
URL parameters creating duplicate pages (/products?sort=asc, /products?sort=desc, /products all showing the same content)
Pagination duplicates — page 1 content leaking onto /page/2 via shared introductory sections
Printer-friendly or AMP versions without proper canonical tags
Scraped content — third parties copying your pages (you're the victim, but you still lose the signal)
Category/tag archive pages that duplicate post content with no unique value

How to audit for duplicate content

Run a full site crawl with Screaming Frog (free up to 500 URLs) or Sitebulb. Filter the results by 'Duplicate Page Titles', 'Duplicate H1s', and 'Duplicate Meta Descriptions' first — these are the fastest proxy signals. Then check the 'Duplicate Content' tab, which clusters URLs with near-identical body content.

✦ Insight

For parameter-driven duplication on large e-commerce or SaaS sites, check GSC → Index → Pages and filter for 'Duplicate without user-selected canonical' and 'Duplicate, Google chose different canonical than user'. These two statuses tell you exactly where Google is ignoring your canonical instructions.

Fix 1: 301 redirects for structural duplicates

For HTTP/HTTPS and WWW/non-WWW duplicates, a 301 redirect is the correct fix — not a canonical tag. A canonical is a hint; a redirect is a command. Implement 301 redirects at the server or CDN level to funnel all variants to a single preferred URL. This is also required for any legacy URL migrations.

# nginx: redirect HTTP to HTTPS and non-www to www
server {
  listen 80;
  server_name example.com www.example.com;
  return 301 https://www.example.com$request_uri;
}

# Next.js: redirects in next.config.js
redirects: async () => [
  { source: '/:path*', has: [{ type: 'host', value: 'example.com' }],
    destination: 'https://www.example.com/:path*', permanent: true },
]

Fix 2: Canonical tags for parameter-driven duplicates

For URL parameters that generate near-duplicate pages (sort, filter, session IDs), add a self-referencing canonical tag on every parameterised page pointing back to the clean base URL. This tells Google to consolidate all ranking signals on the canonical version.

<!-- On /products?sort=price&color=red, canonical points to clean URL -->
<link rel="canonical" href="https://example.com/products" />

<!-- On /products itself, canonical is self-referencing -->
<link rel="canonical" href="https://example.com/products" />

Fix 3: Consolidate thin or near-duplicate pages

If you have multiple pages on very similar topics (e.g. 'Best CRM for startups', 'Best CRM for small business', 'Best CRM for SaaS companies') that each rank poorly, consider merging them into one comprehensive page and 301-redirecting the others. A single authoritative page beats three mediocre ones in Google's eyes.

⚠️ Warning

Don't noindex your way out of duplicate content problems at scale. Noindexed pages still get crawled, still consume crawl budget, and don't pass link equity. For pages you genuinely want to remove from Google's consideration, consolidate and redirect rather than noindex.

💡 Tip

Chapter 1 of SEOdisaster includes a canonical tag crisis scenario — a platform migration that created thousands of duplicate URLs overnight. Work through the triage in the game to build the pattern recognition you need for real audits.