Index

Provenance You Can See: Why Content Should Carry Its Own Evidence

Tom Cranstoun · The Machine Experience Authority · CogNovaMX Ltd


There is a small padlock in your browser's address bar, and most people have spent twenty years misreading it. They take it to mean the page is safe, or true, or official. It means none of those things. It means the connection between your device and the server is encrypted. It attests the pipe, not the page. A fraudster's site can carry exactly the same padlock as your bank, because the padlock was never a statement about the content. It was a statement about the plumbing.

That gap - between what we can verify and what we actually want to know - has shaped the web for its whole life. We can verify that bytes arrived unaltered from a server. We cannot, from the padlock alone, verify who wrote them, when, on what authority, or whether they have been changed since. So we guess. We judge a page by how it looks, by whether the brand is familiar, by the confidence of the prose. Inference, dressed as knowledge.

For human readers that has always been a quiet risk. For AI agents it is now an expensive one.

Trust by inference does not survive contact with machines

When a person reads a page, they bring a lifetime of context to the guess. They notice a clumsy logo, an address that does not resolve, a tone that feels off. The inference is unreliable but it is not naive.

A machine summarising that same page brings far less. It reads the text and produces an answer, and it has no native way to tell a primary source from a confident copy, an official record from a forum post that paraphrased it three years ago. When it cannot tell, it does what these systems do: it fills the gap with something plausible. We call the result a hallucination, but the deeper cause is structural. The machine was asked to know something the content never told it - where it came from and whether to believe it - so it inferred, and sometimes it inferred wrongly.

You cannot prompt your way out of this. No amount of cleverness lets a model verify a claim that arrives carrying no evidence. The fix has to travel with the content.

What "provenance you can see" actually means

The idea is simple to state. A piece of content should be able to carry a checkable claim about itself: who published it, when, under what identity, and a fingerprint of the exact bytes so any later alteration shows. Not a logo. Not a brand promise. A signed statement that a machine can test and a reader can be shown the result of.

Concretely, that means three things travelling alongside the content:

  • An identity. Who is making the claim, expressed as an identifier that can be resolved and checked rather than simply asserted. Decentralised identifiers (DIDs) are the open standard built for exactly this.
  • A signature. A cryptographic signature over a canonical form of the content, so that any change - one digit in an opening time, one altered figure - breaks the signature and is detectable.
  • A timestamp that cannot be quietly backdated. Recording the claim in an append-only transparency log, the same mechanism that underpins Certificate Transparency, means "this existed on this date" is something others can audit, not something the publisher can rewrite later.

None of this is novel cryptography. It is the assembly of standards that already exist - DIDs, HTTP Message Signatures (RFC 9421), JSON canonicalisation, Merkle-tree transparency logs (RFC 6962) - into a claim that travels with the file instead of being locked inside one company's database.

The part that matters: you do not have to trust the messenger

Here is the property that makes this worth doing rather than just another badge. Verification happens on the reader's side.

A registry can hold these signed claims, index them, and serve them, but it does not get to be the arbiter of truth. It produces an attestation record; it does not ask you to take its word for anything. The agent, or the browser, or the auditor checks the signature itself, against the publisher's resolvable identity, using open libraries anyone can run. If the registry lied or was compromised, the maths would not check out, and the client would say so.

This is the inversion that the padlock got right and that content trust has been missing. You do not trust a certificate authority because it is nice. You trust it because your browser can verify its chain independently, and would scream if it could not. Provenance you can see works the same way: the trust is in the verification, not in the vouching.

What it changes for a human reader

Not much that is visible, which is the point. The cryptography stays out of sight, the way TLS does. What surfaces is a small, honest distinction the reader learns to weight: this fact is attested - it came signed from a named source and the signature checks out - and this fact is inferred, assembled by a machine from things it read. Both can appear on the same screen. The reader now knows which is which.

That does not make the inferred fact wrong; it makes its status legible. An earlier generation learned to glance for the padlock before typing a card number. The same instinct, pointed at the content rather than the connection, is what changes.

What it changes for an AI agent

This is where the value compounds. An agent that can verify provenance can be built to prefer attested facts over inferred ones, and to say so. Asked for a surgery's opening hours, it can return the hours that arrived signed by the surgery last week, and mark them as attested, rather than averaging three contradictory mentions it found and presenting the average with false confidence.

Two things follow. Hallucination drops, because the agent is leaning on facts that carry evidence instead of inventing the missing context. The agent's answer also becomes auditable - it can show which sources it relied on and which it discarded, with the verification to back the choice. For anyone deploying agents where accuracy is not optional - healthcare, finance, legal, technical documentation - that is the difference between a tool you can stand behind and one you have to apologise for.

The two pillars

It helps to separate two jobs that are easy to blur. Making content machine-readable is one job: structure, semantics, explicit metadata, so a machine understands what the content is. Making content machine-trustworthy is a different job: signed, attested, verifiable, so a machine can tell whether to believe it.

MX does the first. It is the metadata that records a file's provenance, context, and intended use, and travels with the file. Reginald does the second: it is the infrastructure that holds and serves the signed claims, and produces the attestation records a client verifies for itself. Readable without trustworthy is a tidy lie. Trustworthy without readable is a locked box. You want both, and they are deliberately built as separate layers on an open standard, so that the readability is everyone's and the verification is no one's monopoly.

The honest limits

I will not oversell this, because the failure mode of trust technology is to promise more than maths can deliver.

Visible provenance does not make lying impossible. A publisher can sign a false statement, and the signature will verify perfectly - it proves who said it and that it has not changed, not that it is true. What it does is make lying legible and attributable. The claim is bound to an identity and a timestamp that cannot be quietly retracted. You can no longer say a thing, profit, and pretend later you never said it.

None of this grants compliance with any regulation, either. Provenance and attestation are an evidence vehicle for the governance regimes now arriving - the EU AI Act, the European Accessibility Act, emerging digital-records law - not a compliance grant. They make the documentation an organisation already owes structured, tamper-evident, and verifiable on request. The legal duty stays with the organisation. Sold honestly as "queryable, verifiable evidence", this is genuinely useful. Sold as "compliance in a box", it collapses on the first inquiry.

How to start, without waiting for the world to catch up

You do not need the full machinery to begin. The first move is to stop hiding what you already know about your own content. Publish, in a form a machine can read, who authored a page, when, and under what authority - the explicit metadata that good MX practice already asks for. That alone moves a fact from inferred to stated.

The signing and the transparency log are the next layer, and they matter most where a wrong answer is costly: prices, clinical information, official records, anything an agent will quote back to someone who acts on it. Start there. Sign the content that, if misquoted, would cost you a customer or a court case.

The web spent thirty years teaching machines to guess about content and then acting surprised when they guessed wrong. The work now is to give content the evidence to stop the guessing - evidence the reader can see and the agent can check. The padlock told us about the pipe for long enough. It is time the page could speak for itself.


Tom Cranstoun is the Machine Experience Authority and founder of the MX community. He consults on MX strategy through CogNovaMX Ltd.