Index

The Padlock Attests the Pipe, Not the Page

There is a small padlock in every browser's address bar, and the web has spent twenty years misreading it. The padlock means that the connection between the device and the server is encrypted. It does not mean that the page is true, official, or safe. A fraudster's site can carry exactly the same padlock as a bank, because the padlock was never a statement about the content. It was a statement about the plumbing.

That gap, between what the protocols verify and what a reader actually wants to know, has shaped the web for its entire life. The TLS layer verifies that bytes arrived unaltered from a server. It does not, from the padlock alone, tell anyone who wrote them, when, on what authority, or whether they have been changed since publication. Human readers have always filled that gap with inference: how a page looks, whether the brand is familiar, the confidence of the prose. Inference dressed as knowledge.

For human readers, trust by inference has always been a quiet risk. For machine readers it is the central failure mode.

Why machines suffer worse

When a person reads a page, they bring a lifetime of context to the guess; they notice a clumsy logo, an address that does not resolve, a tone that feels off. The inference is unreliable but not naive.

A machine summarising the same page brings far less context. It reads the text and produces an answer, and it has no native way to tell a primary source from a confident copy, an official record from a forum post that paraphrased it three years ago. When it cannot tell, it does what these systems do: it fills the gap with something plausible. The output is what the field calls hallucination. The deeper cause is structural. The machine was asked to know something the content never told it, where it came from, and whether to believe it, so it inferred, and sometimes it inferred wrongly.

No amount of prompt engineering lets a model verify a claim that arrives carrying no evidence. The fix has to travel with the content.

Two problems, one shape

The agentic web is talking, in 2026, about two problems that look distinct and are actually the same shape.

The first is the discovery problem. A machine cannot use a document it cannot find. Self-describing content helps; without indexing and identifiers, the helpfulness stops at the file boundary. An LLM that does not know the surgery's website exists cannot answer the question "what time does it open on Saturday".

The second is the trust problem. A machine cannot rely on a document whose origin it cannot check. Even if the LLM finds three plausible Saturday hours, none of them carry evidence the LLM can verify. So it averages, or it picks the most confident-sounding source, or it invents one that fits the pattern of an opening-hours line. Each of those failure modes has been documented in production.

Both problems are gaps in the same place. The discovery problem says the machine cannot find the content. The trust problem says the machine cannot evaluate the content. Both are answered by attaching a checkable claim about the document to the document itself, indexing those claims so the machine can find them, and giving the machine an open way to verify each claim on its own side.

That is the architectural job. A registry does it, in a particular shape.

What reader-side verification actually looks like

The temptation, when an industry hits a trust gap, is to nominate a referee. Pick an organisation, give it the power to bless content, and let everyone else trust the organisation. This is how content moderation, certification schemes, and most professional registers work. It is also how every centralised trust regime fails: the referee gets captured, gets compromised, or gets bored.

The web's TLS infrastructure, by way of comparison, does not work this way. Browsers do not trust a certificate authority because the authority is nice. They trust it because the browser can verify the certificate's chain independently, using open libraries, against an algorithm anyone can audit. If the authority lied or was compromised, the maths would not check out, and the client would say so. The trust is in the verification, not in the vouching.

A modern provenance layer works on the same principle. Three things travel alongside any content. A claim about origin, naming who published it, when, and under what identity, with the identity resolvable independently (the W3C's DID Core is the standards-track answer; the practical equivalent is a public key bound to a domain the reader can look up). A signature over the claim, plus a fingerprint of the bytes, using a well-trodden scheme (RFC 9421 HTTP Message Signatures for the HTTP carrier, JSON Canonicalisation for the static-file case), so any later alteration shows. A way for the reader to fetch the publisher's identity material at verification time, plus optionally a transparency log (RFC 6962) that records every claim the registry has ever served, so a tampered or back-dated claim becomes visible to anyone who looks.

None of this is novel cryptography. It is the assembly of standards that already exist into a claim that travels with the content rather than being locked inside one company's database.

The registry's job in this picture is narrower than people assume. It holds and serves signed claims. It indexes them so an agent can find them. It records what it has served so the record itself is auditable. It does not arbitrate truth. The reader's client checks the signature itself; the registry could lie, and the maths would say so. That is the inversion that makes the architecture worth doing rather than just another badge.

What the publisher does

If you publish content on the web, none of this lands by accident. The first thing the content needs is an explicit, machine-readable identity: a stable URL, structured metadata describing what the content is and what it is about, an author or organisation it is attributable to. This is the Machine Experience layer. Schema.org, Dublin Core, and the standards your sector already uses cover the vocabulary thoroughly; the gap in published content today is discipline about emitting it, not the words to use.

The metadata then needs signing. A signing step takes the structured metadata, canonicalises it, hashes it, signs it with the publisher's private key, and binds it to the publisher's resolvable identity. The verification material (the public key, the DID document, whatever the chosen scheme uses) is hosted somewhere stable a reader can fetch independently. The signature is small. The discipline is putting the signing step into the publishing pipeline so it happens by default, not as a one-off.

Last, the attestation has to be discoverable, which is where the registry comes in. The publisher registers the attestation; the registry indexes it; an agent that finds the original content can resolve the attestation, fetch it, and verify it. The registry's role here is logistical, not authoritative. If the registry vanished tomorrow, attestations cached elsewhere would still verify, because the cryptographic chain does not depend on the registry being honest.

Each step pays its own way. Identity alone makes the content findable and parseable. Identity plus signing makes the content verifiable for anyone who already has it. Adding the registry closes the loop with discovery: agents that have never encountered the publisher before can find the content, find the attestation, and decide whether to believe it.

What a reader or agent sees

For human readers, almost nothing visible changes. The cryptography stays out of sight, the way TLS does. What surfaces is a small, honest distinction the reader learns to weight: this fact is attested, this fact is inferred. Both can appear on the same screen. The reader now knows which is which.

For AI agents, the value compounds. An agent that can verify provenance can be built to prefer attested facts over inferred ones, and to say so. Asked for a surgery's opening hours, it returns the hours that arrived signed by the surgery last week, marked as attested, rather than averaging three contradictory mentions and presenting the average with false confidence. Two things follow. Hallucination drops, because the agent is leaning on facts that carry evidence instead of inventing the missing context. The agent's answer also becomes auditable, because it can show which sources it relied on and which it discarded, with the verification to back the choice. In the use cases where accuracy is not optional (healthcare, finance, legal, technical documentation), that is the difference between a tool you can stand behind and one you have to apologise for.

Honest limits

This is not a panacea. Three things in particular it does not do, and saying them out loud beats letting a buyer assume otherwise.

Lying still works. A publisher can sign content that says something untrue, and the signature will verify. What changes is that the lie now travels with an identity and a fingerprint. The publisher cannot retract it without leaving a record. A reader who is being misled has an audit trail that names the source.

Compliance does not come with the architecture either. The EU AI Act, the European Accessibility Act, the UK ICO code, the NIST AI Risk Management Framework, and the regulatory regimes around them remain a legal duty of the organisation. What the technical work does is narrower and useful: it makes the documentation those duties already require structured, tamper-evident, and verifiable on request. Sold as that, it is honest. Sold as ethics in a box, it collapses on the first inquiry, and deserves to.

A determined adversary still gets in. Nation-states compromise endpoints, steal private keys, and coerce publishers. What the architecture does is raise the cost of subtle, undetectable interference. Signed content with a fingerprint cannot be quietly edited after the fact. A compromised key produces signatures that succeed cryptographically and fail a transparency log's gap-detection. The adversary still gets there; the operator and the regulator now see it happen.

The byline for the machine age

For five hundred years, publishers built the accountability layer for the print era one habit at a time. The byline, so a reader knew who wrote it. The masthead and printer's mark, so a reader knew who set the type. Libel law, so there was a name to point at. None of it stopped lying. All of it made lying legible.

The agentic web is at the same moment with the same answer. Provenance does not tell the press what to generate. It lets the page say who set the type, when, and on what authority, in a form a machine can actually read and check. The padlock kept attesting the pipe. The content can now attest the page.

A great deal of the work that follows is structural, dull, and load-bearing: the metadata fields, the signing chain, the registry that holds the claims, the transparency log that audits the registry, the client libraries that do the verification on the reader's side. None of it is glamorous. All of it is the difference between an agentic web that knows what it knows and one that confidently invents the rest.

Tom Cranstoun is the Machine Experience Authority and founder of the MX community. He consults on MX strategy through CogNovaMX Ltd.