LLMs are now writing a meaningful share of the marketing pages on the open web. A lot of them are bad — not in ways a human editor would catch on first read, but in ways an SEO audit would catch in two seconds. Missing meta descriptions. Bloated title tags. Headings out of order. The same five mistakes, on every page, at scale.

This post is the setup for a quality gate that sits between your LLM and your CMS: the model generates a page, the gate audits it, and the page only publishes if it scores above threshold. ~50 lines of Python and a working SEO audit API key.

Why "ship it and see if it ranks" doesn't work for AI pages

If you generate one page a week, you can hand-edit it. If you generate fifty, you can't. The whole premise of AI content pipelines is volume — and volume breaks human review.

So the review needs to happen in code. The question is what to review against. Engagement metrics are too downstream (you'd wait weeks for the signal). Manual rubric scoring is too slow (defeats the whole purpose). Lighthouse covers performance, not SEO depth. The SEO audit response is the right shape — structured, immediate, scriptable.

The pipeline shape

Five stages, in order:

  1. LLM generates the page (text + frontmatter)
  2. Static-site generator renders it to HTML in a staging dir
  3. Staging dir gets pushed to a preview URL (Vercel/Cloudflare/your-CDN)
  4. SEO Score API audits the preview URL
  5. If score ≥ threshold, the publish job promotes the page to production. If not, it goes to a review queue.

The audit step is the gate. Everything before it is content generation; everything after is publishing infrastructure. The gate is the seam that turns "AI-written" into "production-ready."

The code

# publish_or_queue.py
import os, sys, json, subprocess
from pathlib import Path
from seoscoreapi import audit

THRESHOLD = 82
KEY = os.environ["SEO_KEY"]
PREVIEW_BASE = "https://staging.your-site.com"

def gate(slug: str) -> bool:
    url = f"{PREVIEW_BASE}/{slug}"
    result = audit(url, api_key=KEY)
    score = result["score"]
    print(f"  {slug}: {score} ({result['grade']})")

    if score < THRESHOLD:
        # Drop a JSON file the review queue picks up
        review_dir = Path("review-queue")
        review_dir.mkdir(exist_ok=True)
        (review_dir / f"{slug}.json").write_text(json.dumps({
            "slug": slug,
            "score": score,
            "priorities": result.get("priorities", []),
            "url": url,
        }, indent=2))
        return False

    # Promote to production
    subprocess.run(["./publish.sh", slug], check=True)
    return True

if __name__ == "__main__":
    slugs = sys.argv[1:]
    passed = sum(1 for s in slugs if gate(s))
    print(f"\n{passed}/{len(slugs)} pages passed the gate")

Hook this in after your static-site build step. For a Hugo or Astro pipeline, that's a single python publish_or_queue.py $(ls new-pages/) call in the build script.

Picking the threshold for AI content

Threshold strategy is different for AI-generated content than for hand-written. With human content, the score is usually ±2 from your baseline. With AI content, the scores tend to bimodal — a tightly clustered "the model did fine" group around 85+, and a long tail of "the model omitted critical metadata" pages in the 50s.

Set the threshold tight (85+) for AI content. The cost of a false negative (one good page goes to review) is much lower than the cost of a false positive (one bad AI page ships and the model learns nothing).

What the gate actually catches

In production at a couple of content-heavy customers, the gate catches roughly these failure modes, in order of frequency:

  1. Missing meta description. The model wrote the body but didn't fill the description field in the frontmatter. ~40% of failures.
  2. Title length out of range. Either too short (12 chars: "10 SEO Tips") or too long (98 chars, the entire H1). ~25%.
  3. Duplicate H1. The model put a <h1> in the body even though the template already renders one from frontmatter. ~15%.
  4. Missing OpenGraph image. Pages generated without an og:image field, especially common when the model doesn't have a hero image generator wired in. ~10%.
  5. Heading hierarchy broken. H1 → H3 → H2 → H4. Looks fine to a reader; breaks crawlers. ~10%.

None of these are subjective. All are caught by the audit on first run. All are fixable by the model on retry if you feed the priorities array back into the prompt.

The retry loop

The most useful pattern: failed pages don't go straight to a human, they go back to the model with the audit findings:

if score < THRESHOLD:
    fix_prompt = f"""
    Your page scored {score}. Here are the issues to fix:
    {json.dumps(result['priorities'], indent=2)}

    Rewrite the page addressing those issues. Return the full markdown.
    """
    fixed = claude.messages.create(model="claude-opus-4-7", ...)
    # Re-build, re-audit, re-gate.

Two retry attempts close ~70% of the initial failures in practice. The 30% that still fail after two retries are the ones that genuinely need human review — which is exactly the population a review queue should hold.

Why this matters for AI content credibility

The big risk in AI-generated content isn't that any individual page is bad — it's that the average is bad, and Google's helpful-content systems read averages, not individuals. One low-quality page on a site is invisible. Five hundred low-quality pages on a site is a sitewide signal.

The SEO gate is what prevents that average from drifting. It doesn't make the content good — that's the model's job, and your prompts' job. It just makes sure nothing demonstrably broken ever ships. That floor is the difference between "AI content as a real channel" and "AI content as a Google penalty in waiting."

If you're running a content pipeline that generates more than ~5 pages a week, the gate is worth the afternoon to set up. Try the SEO audit API free on the same shape of page you're already publishing and see what the audit catches.