We Were Scoring SaaS Sites With an E-Commerce Rubric. Here's What We Fixed.
A founder's honest account of how a user's feedback revealed a fundamental flaw in our free diagnostic — and the engineering work we did to fix it.
A few weeks ago, a SaaS founder ran their product site through our free diagnostic and got an F.
They emailed me. Politely, but bluntly: the checks that dragged their score down were things like "Product schema on product pages," "Returns policy link," and "Offer schema completeness." Their site sells software subscriptions. It doesn't have product pages. It doesn't have a returns policy. It doesn't list Offer schema — because it's not a store.
They were right. We had built an e-commerce rubric and applied it to every site. A SaaS company, a blog, a portfolio — they all got evaluated against the same checklist designed for an online store. That's not honest scoring. An F that means "you're not a store" isn't useful to anyone.
This post explains what was broken, what we rebuilt, and why the new approach is a more honest measurement of agent readiness for any kind of site.
What Was Broken
Our original diagnostic ran about 20 checks against every URL. Most of them were store-specific: product schema, offer completeness, GTIN/MPN identifiers, returns and shipping policies, ACP well-known endpoints for checkout. All legitimate signals — for stores.
The problem is that we counted all of them toward the denominator for every site. A SaaS homepage that had robots.txt, an llms.txt, and clean semantic HTML would still score a D because it "failed" eight store checks it could never meaningfully pass.
There was a second problem hiding underneath: some checks were counting zero points in practice — ghost checks we'd listed in our spec but never wired up to actual evaluation logic. They always showed as failed, dragging scores down without any legitimate signal.
The honest summary: we were measuring "how store-like is this site" and calling it "how agent-ready is this site." Those are different questions.
The Fix: A Universal Base Rubric + Vertical-Specific Extensions
We redesigned the scoring from scratch. The new architecture has two layers:
- Base rubric (17 checks): Signals that any site — store, SaaS, blog, portfolio — can meaningfully pass or fail. robots.txt with AI agent directives, llms.txt, SameAs organization identity, semantic HTML structure, schema honesty, sitemap coverage, privacy and terms pages, HTTPS, page speed, structured data (Organization, WebSite, BreadcrumbList), and a few others. These are the agent-readiness fundamentals.
- Vertical rubrics (9–17 additional checks each): A store adds product schema, offer completeness, GTIN identifiers, policy pages, UCP/ACP endpoints, and checkout capability checks. A SaaS site adds SoftwareApplication schema, MCP endpoint discovery, OpenAPI spec, pricing page presence, and API documentation. A blog adds Article schema, author identity, publication freshness, and editorial structure.
Only checks from the selected rubric count toward the score denominator. A SaaS site isn't penalized for missing a returns policy. A blog isn't penalized for lacking product schema. Every site is scored against the checks that actually apply to it.
How We Detect Site Type
The scoring rubric is selected automatically based on a positive-signal detector that runs during the crawl. We probe specific routes, look for platform-specific HTML patterns, check well-known endpoint availability, and look at sitemap structure.
A few examples of signals that push toward "store": a /cart or /checkout route that returns 200, Product schema on sampled pages, Shopify or WooCommerce platform fingerprints, offer or price references in the HTML. Signals that push toward "SaaS": a /pricing route, SoftwareApplication schema, /docs or /api routes, an OpenAPI or MCP well-known endpoint.
We assign a confidence level (high/medium/low) based on how many strong signals we found. When confidence is low — or when the user knows we got it wrong — there's a correction form directly on the results page. Submitting it re-scores with the correct vertical and logs the correction so we can improve the detector over time.
We also replaced the old platform fingerprinter with one that uses scoped token matching instead of substring search. The old approach would fire on any page that contained the word "shopify" in any context — including a competitor comparison page that mentioned Shopify. The new one looks for structural tokens specific to Shopify's HTML output: class names, script src patterns, and meta generator tags.
Ghost Checks, Deleted
While we were in there, we removed every check that existed in our spec but wasn't wired to real evaluation logic. These were checks that always returned "failed" regardless of what the page contained. They inflated the denominator without adding signal.
We also removed the per-check "Pro fix" badges from the results page. Every failing check had a lock icon and a prompt to upgrade. It felt aggressive and it obscured the actionable information — the actual fix guidance — behind a paywall prompt.
The new results page shows the fix guidance directly. The CTA at the bottom is consolidated into one section that reflects what's actually relevant: the Shopify app if you're a Shopify store, a strategy call always, and the newsletter for everyone else.
What's the Same
The fundamental question the diagnostic answers hasn't changed: can AI agents find, understand, and transact with your site? We're still checking the same core signals — robots.txt, llms.txt, structured data, sitemap coverage, HTTPS, page speed, semantic structure.
If you're a Shopify or WooCommerce store, your results look similar to before, minus the ghost checks. You may see your score shift slightly — some drops from removing always-false ghost checks, some gains if our rubric was previously undercounting your real strengths.
If you're a SaaS product, a blog, or anything that isn't a store: your score is now meaningful. You'll see the checks that actually apply to your site type, the ones you've passed, the ones that need work, and a "not scored for this site type" section showing the store checks we explicitly excluded.
What's Next
The detector will get better over time. We're logging every correction users submit and building a labelled corpus of sites. A few things on the roadmap:
- A marketplace vertical (Etsy sellers, Amazon storefronts) with its own rubric
- A local business vertical (NAP consistency, Google Business Profile signals, local schema)
- Vertical-aware benchmark scoring — "how do you compare to other SaaS sites" — once we have enough labelled data
If you run the diagnostic and the detected site type is wrong, please use the correction form. Those signals directly improve detection for everyone.
And if you spot something else that looks wrong — a check that doesn't make sense for your site, a score that seems off — reply to any of our emails or reach out directly. The user who prompted this rewrite is the reason this diagnostic is more honest today.
Try the Updated Diagnostic
Run the new universal rubric against your site — store, SaaS, or blog — and get an honest agent-readiness score.