The web is just the start: what AI agents actually need from your file-data
Google's developer platform published a guide to AI agent UX. The article, at web.dev/articles/ai-agent-site-ux, asks developers to think about how AI agents experience their sites. Reduce friction. Use clear headings. Avoid ambiguous navigation. Make content semantically predictable.
The advice is sound. And the fact that Google's developer platform is publishing it signals something worth noting: AI agent readiness is now a mainstream concern, not a niche view held by accessibility engineers or structured-data specialists. This is the direction the web is moving, and Google is telling developers to move with it.
The file-data problem
The guide focuses on websites. That makes sense, Google indexes websites. But the file-data AI agents read extends well beyond the browser:
- Contracts and policy PDFs
- Product specifications and technical handbooks
- Regulatory filings and compliance reports
- Internal knowledge bases and intranet pages
- Training videos, podcast briefings, recorded calls (audio and video streams)
- Diagrams, photographs, dashboard screenshots, infographics (image files)
- Datasets, schema files, API responses (structured-data feeds)
Every one of these is now being consumed by AI agents. Every one of them carries the same problems the web.dev guide describes, ambiguity, implicit structure, missing provenance, and none of them sits on a web page that a developer can adjust for UX.
The challenge is a file-data problem rather than a web problem; it applies to anything you publish that a machine might read in isolation, away from the surrounding context that gave it meaning.
What machines need from any file-data
When a machine reads a file, any file: a video, a podcast, a PDF, an image, a web page, it needs to answer ten questions, not four:
- What is this thing, its identity, category, and role?
- What is inside it, its structure, sections, and fields?
- What state is it in, draft, live, deprecated, complete, or partial?
- Who created it, and who stands behind it?
- How did it come to be, was it written by a human or generated by an agent?
- What is the reader allowed to do with it?
- What should happen next, which workflow transition is valid from here?
- What other files or standards does it depend on?
- What does a correct output look like, if one is expected?
- What is the safe thing to do when something is unclear?
Most file-data answers none of these today. An agent reading a contract, a product specification, a podcast transcript, or a video manifest has to infer all of it. That inference is expensive in compute terms. It introduces error. And it makes provenance impossible to verify, which matters as AI-generated content multiplies and regulators begin to require proof of origin.
COGs: what file-data says about itself
This is the gap that COGs address. COG stands for Community Owned Governance System. A COG is a small set of declarations any file makes about itself, carried in plain text, in the file header (or the file's sidecar where the format does not have a header of its own), before the content begins. It answers the ten questions directly, so no machine has to infer them.
The core declarations:
- Identity
- What this is, who wrote it, who stands behind it, what version it is.
- State
- Whether the file is draft, live, or deprecated, so an agent does not treat a provisional draft as a signed contract.
- Provenance
- Whether it was human-directed or agent-generated, and the full authorship chain.
- Conformance
- Which standards it promises to follow.
- Permissions and failure mode
- What actions are allowed, what require human approval, and what the safe default is when something is unclear.
A file carrying a COG does not require inference. It requires execution. The meaning is explicit. A machine can verify the provenance, check the conformance claims, and act, without guessing, without re-reading.
COGs declare provenance, they cannot verify it independently. A file's frontmatter (or sidecar) says who published it and when; nothing in the file itself can prove that claim to an external reader. That is where Reginald fits: the public registry where files are signed and registered, so any agent can verify that this is what the owner published, unaltered since publication, and whether it was produced by a human, an AI, or an automated system. Agents reading attested files hallucinate less, they have verified facts to cite rather than gaps to fill by inference. MX makes content machine-readable. Reginald makes it machine-trustworthy. MX is the DNA a file carries when it leaves any pool, so a video extracted from a course library, a PDF lifted into a training corpus, or a podcast transcript pulled into a RAG retriever each remains interpretable in the new context.
COGs are not a new format. They sit inside existing file formats, Markdown, HTML, PDF, YAML, XMP for media, sidecars for binaries. They travel with the file. They require no new runtime, no proprietary tooling, no installation.
Beyond the web
The web.dev guide is a useful prompt for any web team. But the content most enterprises rely on sits mostly off the web, inside content management systems, intranets, document management systems, regulatory archives, manufacturing databases, video libraries, podcast feeds, image asset banks, and dataset stores.
Machine Experience (MX) extends the discipline the web.dev guide describes to all of those surfaces. The question it asks is the same one Google is asking about web pages: can any machine that reads this understand what it means, who made it, and what it is allowed to do?
For most enterprise file-data today, the answer is no.
What this means for your content
If you are responsible for content or web experience, the web.dev guide is worth reading. After you have read it, ask a harder question: what happens when an AI agent reads your file-data, not your web pages? The training video your customer success team recorded last quarter. The PDF datasheet your sales team emails out. The podcast episodes your CEO records. The diagrams in your engineering wiki. The dataset your finance team publishes. Each of those is a file an agent will eventually read in isolation, away from the surrounding context that gave it meaning.
Your contracts, your product documentation, your policy files, your service specifications, your training videos, your podcast briefings, your image libraries, do they declare their own identity? Do they carry provenance? Do they specify what an agent is allowed to do with them?
If not, agents will guess. Sometimes they will guess correctly. Often they will not.
COGs are the infrastructure for file-data that does not leave machines to guess. They are governed openly at tg.community as a community standard, no single vendor, no licensing, no proprietary runtime.
The web.dev guide describes what good looks like on a web page. COGs describe what good looks like in any file-data, a video, a podcast, a PDF, an image, a web page, anywhere it travels. The web is just the start.