Index

Google Shipped a Knowledge Format, and Left the Hard Half Open

Google has published the Open Knowledge Format, and I've been waiting two years for someone with this much reach to say it out loud.

OKF represents knowledge as a directory of markdown files with YAML frontmatter. Concepts link to each other with ordinary markdown links, so the directory reads as a graph. There's no compression scheme, no runtime, no SDK you're required to adopt. A small set of reserved frontmatter fields defines how one producer's knowledge can be read by another consumer's agent, and the content below that's left to the producer. The stated principles are that it should be minimally opinionated, that producers and consumers stay independent of each other, and that the answer to sharing knowledge between agents is a format, not another platform.

That is the Machine Experience argument, almost line for line. Built on markdown, YAML, and HTML. Nothing proprietary. The same file readable by a human and a machine, with no translation layer between them. And the framing I keep coming back to, that the answer is a format and not yet another service, is now stated by about the largest source imaginable. When the company that runs one of the world's biggest data platforms decides the right structure for machine-readable knowledge is markdown with frontmatter in a linked directory, the argument is settled. I'm glad to have the company.

So I want to be clear before anything else: this is good news, and I'm not going to pretend otherwise to manufacture a disagreement.

What it confirms

OKF confirms the foundation. Machines need context that travels with the file, in a carrier it already uses, that a person can still read. It confirms that you don't solve this by standing up another platform and asking everyone to push their knowledge into it. You solve it with a convention thin enough that producers and consumers can meet without anyone owning the ground between them.

A cog, the unit MX is built on, is a markdown file with YAML frontmatter that tells a machine what it is, how it relates to other files, and how to use it. An OKF concept is a markdown file with YAML frontmatter that tells an agent the same kind of thing. These are the same structure. A cog can be expressed as an OKF concept, because OKF asks for almost nothing below its interoperability surface and leaves the rest to the producer. So I am not building a rival format. MX rides on top of this one. Where OKF covers the ground, MX defers to it and points, the same way MX already defers to Schema.org, Dublin Core and the rest. Never reinvent what an existing standard already does well.

Why this one gets used where the others got abandoned

There is a second reason OKF matters, and it is the one I keep coming back to, because it is the reason this format gets maintained where every wiki before it got abandoned.

Andrej Karpathy made the point that the real value of an "LLM wiki" is that the wiki finally stays current. Traditional wikis fail for one boring, human reason. Notion, Obsidian, the team Confluence nobody has opened since the last reorganisation: somebody has to keep them up to date, keeping them up to date is tedious, and people quietly stop. The knowledge rots, and it rots not because the format was wrong but because the bookkeeping never got done.

An agent does not get bored. It does not forget to update the cross-reference when a fact three documents away changes, and it will touch a dozen files in a single pass to keep them consistent. That is the actual unlock. The format was never the hard part; the upkeep was, and OKF is thin enough that an agent can do the upkeep on its own. It is the first time the maintenance problem has had an answer that does not depend on human diligence holding out forever.

I find this persuasive because I have watched it work. Give an agent a structure thin and legible enough to maintain, with scripts checking what the prose claims, and it will keep the thing current in a way no team ever sustained by hand. A format an agent can keep current is a format that survives contact with a real team.

And this is where it loops straight back to the half OKF leaves open. The more autonomously an agent maintains your knowledge, the more you need to know whether to trust what it wrote. An agent touching a dozen files unattended is exactly the moment you want a signed, dated record of who changed what and whether it still holds. Autonomy without attestation is just faster drift. The maintenance argument is real, and it makes the trust argument more urgent, not less.

The two things it leaves out

Here's where the floor ends.

The first gap is trust. OKF tells an agent what to read. It says nothing about whether the agent should believe what it read. There's no attestation in it, no provenance, no signature. A machine can ingest a perfectly well-formed OKF directory and still have no way to know who published it, when, whether it has been changed since, or if it can be relied on at all. Reading is not the same as trusting, and I have written before about why that difference is the whole game. OKF is the read. It's not the trust.

That trust layer is exactly what MX and REGINALD add. A cog can carry a publisher's signed attestation; REGINALD records it as a dated, checkable entry any machine can verify for itself. Structured data describes a page, but it never validates it, and the validation has to sit underneath. OKF makes the describing work free, which is welcome, and throws the whole of the value onto the validating side it leaves untouched.

The second gap is governance, and it is the sharper one. OKF cannot answer a question about itself: who governs this format when the publisher is also one of the largest vendors in the market? Today it is Google-published and Google-hosted. A format in one company's hands can be extended, deprecated or quietly steered in that company's direction, and nothing in the standard prevents it. This is precisely the question The Gathering exists to answer. A community-led, independent standards body on the W3C model, with no single vendor able to capture it, is the thing a vendor-published format structurally cannot be. I have always said standards should be community-led rather than vendor-driven. OKF is the clearest example yet of why.

Where this leaves us

OKF is the foundation, and a good one. Get your content into this structure. Then carry the two things the format leaves out: signals a machine can trust because they are attested rather than assumed, under a standard governed by a body no single vendor owns.

The format is now free. The trust and the governance are the work. That has always been the half that matters.


Tom Cranstoun founded the MX community and wrote the MX book series. He consults on MX strategy through Digital Domain Technologies Ltd.