Not All Agent-Readiness Scores Measure the Same Thing

23 April 2026 · Tom Cranstoun · 7 min read

A documentation site owner ran two agent-readiness checks in the same week. Cloudflare's isitagentready.com gave her site 33 out of 100. Fern's Agent Score, powered by the Agent-Friendly Documentation Spec, gave it 100 out of 100. Nothing had changed between the two tests.

She wasn't the victim of a fluke or a bug. Both scores were, in their own terms, accurate. What they measured were two different things, neither of which was "is your site actually useful to AI agents".

This post explains why the divergence happened, what it means for anyone trying to make their site more agent-readable, and what a different kind of audit looks like.

The structural problem with vendor-led scoring

Both tools were built by companies with strong commercial interests in a particular version of the agent-web future.

Cloudflare built isitagentready to assess whether a site implements the discovery infrastructure that Cloudflare's tooling reads, .well-known/mcp-server-card.json, .well-known/api-catalog.json, and similar endpoints. If your site implements Cloudflare's preferred protocols, it scores well. If it doesn't, it doesn't. The tool is genuinely useful for understanding how a Cloudflare-centric agent pipeline sees your infrastructure. It's less useful for understanding how most agents see your content.

Fern built the Agent-Friendly Documentation Spec, and then released afdocs to measure compliance with it. The spec is thoughtfully designed for developer documentation. A site that follows it scores 100. A site on a different platform, or one that follows other conventions that serve agents equally well, may score much lower, not because it's worse for agents, but because it doesn't implement the specific choices Fern made when writing the spec.

The documentation site that scored 33 and 100 wasn't broken. She had no Cloudflare infrastructure and no Fern stack. Her content was well-structured and her served HTML was clean. Neither score captured that.

What server logs actually show

Credit where it's due: the key data point here comes from Dachary Carey's analysis of agent-readiness scoring tools, published in April 2026. Carey checked server logs for requests to the .well-known endpoints that isitagentready measures. Despite receiving substantial agent traffic, zero requests came in to those endpoints from coding agents.

The protocols are real. The endpoints exist. Agents just don't read them at scale yet. Acting on an isitagentready score by rushing to implement Cloudflare-specific infrastructure is, right now, tuning for a tool's preferences rather than for how agents actually work.

This may change. Standards that begin as vendor proposals can ratify and achieve broad adoption - that's how much of the web was built. But "may become standard" isn't the same as already being one. Investing in vendor-specific infrastructure before the foundational layers are solid is working in the wrong order.

What matters more than vendor scores

Before any .well-known endpoint matters, agents need to be able to read your site. That requires:

Served HTML that contains your content without requiring JavaScript execution. Server-side agents, those behind ChatGPT, Claude, Perplexity, fetch raw HTML and parse it. If your content loads via JavaScript, they see nothing.
Semantic structure that tells agents what each part of the page is: <main>, <article>, <nav>, proper heading hierarchy. Without these, agents extract text from a flat document, guessing at what matters.
Schema.org JSON-LD that makes entity relationships explicit. A Product with an Offer containing an @id and a priceCurrency tells an agent everything it needs to know about a transaction. A price in a <span> with a CSS class tells it nothing reliable.
llms.txt gives agents a curated map of what the site contains and what access policy applies. Not because all agents read it today, but because those that do get far more efficient access to your content.

These are the layers that reach the widest range of agents, across every platform, without requiring anything specific to any vendor's infrastructure.

How MX-Audit differs

The Web Audit Suite takes a different approach to scoring.

The audit is platform-agnostic. It runs against any site, Shopify, WordPress, static HTML, AEM, a custom stack. It measures what the site does, not whether it implements any particular platform's conventions. A Shopify store and a hand-coded static site are scored on the same criteria.

Every MX-Audit report includes a consultant's review of the automated findings, a second pass that separates what the data shows from what's behind it. Automated tools can measure what is present and absent. They can't verify whether a finding is genuine or an artefact of the audit conditions. They can't read a page and assess whether the content is well-structured for agent comprehension, or identify a platform-specific pattern that looks like a gap but is actually correct.

Each audit adds to a body of pattern knowledge. When a finding appears repeatedly across sites on the same platform, the audit learns to frame it as a shared characteristic rather than a site-specific gap. When a new agent pattern emerges, a new way agents parse served HTML, a new discovery path that actually receives traffic, it enters the scoring model. Vendor-specific tools tune for static compliance. Audits that accumulate findings can adapt.

Vendor-protocol signals are collected but not scored. The audit probes for .well-known/agent-card.json, .well-known/ai-plugin.json, .well-known/mcp-server-card.json, and the other vendor-promoted endpoints. It records what's present and reports these as informational notes, not findings. The site owner can see what the broader vendor field is watching for, without being penalised for not yet implementing it.

The report goes beyond a number to include ROI prioritization, which improvements will reach the most agents for the least effort, engagement options for different levels of investment, and business context. A score of 67 is only useful if you know what it would take to reach 80, which finding to fix first, and what agent capability each improvement unlocks.

A comparison

Dimension	isitagentready (Cloudflare)	afdocs (Fern)	MX-Audit
Protocol basis	Cloudflare-authored	Fern-authored	IETF / W3C / Schema.org / community
Platform-agnostic	No	No	Yes
Measures real content accessibility	Partial	Partial	Yes
Human reviewer in the loop	No	No	Yes
Learns from accumulated audits	No	No	Yes
Business recommendations and ROI	No	No	Yes
Vendor-specific protocol signals	Scored	Scored	Collected, not scored

What to do with third-party scores

They're not useless. isitagentready gives you a clear picture of how a Cloudflare-centric agent pipeline sees your infrastructure. afdocs is an excellent guide if you're building developer documentation on a Fern-compatible stack. Run both if you're curious about your vendor coverage.

But neither should drive investment decisions on its own. A score of 33 from a tool that measures Cloudflare-specific infrastructure isn't a mandate to build it. A score of 100 from a spec-compliance tool isn't a guarantee that agents can successfully use your site.

The question to ask is simpler: can an agent fetch my served HTML, parse its structure, find what it needs in Schema.org markup, and discover more of my site through a well-formed llms.txt and robots.txt? Those answers don't require a vendor-specific score. They require an audit.

The Web Audit Suite is the tool behind MX-Audit reports. It measures the metadata stack, semantic HTML, discovery files, Schema.org coverage, WCAG patterns, and MX governance, and surfaces what agents can and can't access. Get in touch if you would like an audit of your site.