When you operate an SEO audit API, you end up with a strange vantage point. You don't see the full web — you see whatever sites people are bothering to check. That sample is biased: agencies running their client roster, developers wiring up CI checks, builders auditing their own pages. It is not a random crawl of the internet.

But it is a fairly honest cross-section of the sites that someone, somewhere, cared enough about to audit. And after running roughly 8,000 audits, the patterns are remarkably consistent.

Here's what we're seeing.

Most sites land in the same narrow band

The single most surprising finding wasn't about a specific issue. It was about how little spread there is in scores.

  • Average score: ~73 / 100
  • Median: ~73
  • ~90% of all audits land between 70 and 79

Roughly 8% score 80 or higher. Almost nothing scores below 60. The web isn't bimodal between "great" and "terrible" sites — it's a giant middle. Most sites are doing the basics, missing the same handful of intermediate things, and getting a soft B as a result.

That tight clustering matters strategically: if you're operating in this range, the work to move from a 73 to an 85 is mostly the same five fixes applied to almost every site we see.

The five issues that show up almost everywhere

When we tally the top-five priority issues flagged on each audit, five items dominate — each appearing on roughly 9 out of 10 audits we've run:

Issue Share of audits flagging it
No structured data (schema.org) ~94%
No sitemap.xml ~93%
Missing meta description ~92%
No canonical tag ~92%
Thin content on the audited page ~88%

These aren't exotic problems. They're not algorithm-update edge cases. They're the items every SEO checklist has had on it for a decade. And yet they're nearly universal.

A few notes on what's going on under each:

  • Structured data isn't optional anymore. Google's AI Overviews, ChatGPT search citations, and Perplexity's source picker all lean on schema to disambiguate what a page is about. A site without Article, Organization, or Product schema is asking AI search to guess.
  • Sitemap.xml is the cheapest crawl-budget signal you can give a search engine, and most sites still don't have one, or have one that isn't referenced from robots.txt.
  • Meta description isn't a ranking factor — but it's a click-through factor, which is a downstream ranking factor. Letting Google auto-generate it from page content is leaving CTR on the table.
  • Canonical tags are the single most reliable way to prevent self-inflicted duplicate-content problems. Most sites still don't set them.
  • Thin content here usually means the audited URL itself was light on text — landing pages, footer-linked utility pages, "about" stubs. Worth deciding whether those pages should be indexed at all.

Performance is not where most sites are losing

This one cut against my expectations. The performance category — request timing, response size, a basic look at render readiness — averages around 97 / 100 across all audits.

The "your site is slow" narrative is mostly outdated for the modern web. Cloudflare, Vercel, Netlify, and standard CDN setups have made fast-by-default the norm. The sites failing on performance are now the exception, not the rule.

Where they still exist, the failure mode is almost always the same: a heavy WordPress install with a page-builder theme stacking 30+ render-blocking scripts. The fix isn't tuning — it's a different stack.

The social / AI surface area is broken almost everywhere

Inversely, the social score average sits at around 41 / 100. This is the metadata that controls how a page renders when shared in Slack, LinkedIn, iMessage, X, Facebook — and increasingly, how AI search engines build their preview cards.

The pattern: sites have some OpenGraph tags (often whatever the CMS auto-generates), but they're missing og:image, twitter:card, structured author data, and consistent canonical URLs. The result is that when a link gets shared — which is increasingly the dominant traffic source for content sites — the unfurl looks like a 2012 Reddit post.

This is the highest ROI fix in the whole list, because it's the one your traffic source actually sees first.

What the numbers don't tell you

A few honest caveats, because the data has limits:

  • The sample skews toward sites being audited. Sites run through an audit API are disproportionately professional/SaaS/agency clients. Genuinely abandoned or amateur sites are underrepresented.
  • Repeated audits of the same page count separately. If a developer wires up an audit into CI, that one URL gets audited dozens of times. The aggregate is "audits run," not "unique sites."
  • Top-5 priorities is a per-audit ranking. An issue showing up in 92% of audits' top 5 doesn't mean 92% of sites have it — it means it's almost always severe enough to be flagged in the top 5 when present.

So treat the numbers as directional rather than demographic. They tell you what shows up over and over when professionals look at the sites they care about. That's still useful.

The actionable read

If you're prioritizing SEO work in 2026, the data suggests a clear order:

  1. Ship structured data first. This is your AI search insurance policy.
  2. Fix social/OG metadata. Highest visibility per hour of effort.
  3. Audit canonical and sitemap setup. Cheapest "crawl hygiene" wins available.
  4. Stop worrying about base performance unless you're on WordPress with a heavy theme. You probably already pass.
  5. Decide which thin pages should be indexed and noindex the rest, rather than trying to bulk them up.

None of these are new. The takeaway from 8,000 audits isn't what to do — it's how few sites have done it.

If you want to see where your site sits in this distribution, run an audit — it takes about 12 seconds.