Index

Schema.org keeps growing. The provenance layer does not exist yet.

Google is expanding the types of structured data it uses to generate rich results in Search. New schema types support loyalty programs, detailed merchant listings, and interactive event formats. At the same time, Google deprecated seven types, including FAQ and Q&A, that publishers had been using to chase search rankings rather than to describe genuine structure.

Both movements are significant; the deprecations tell the more interesting story.

What is happening

Schema.org markup is the formal vocabulary that lets publishers describe their content in terms machines can read. A product has a price and a rating. An event has a date and a location. A person has a name and an affiliation. By adding this structured vocabulary to a web page, a publisher gives search engines, AI agents, and other software the raw material to understand what it is about without having to infer it from prose.

Google has used Schema.org markup to generate rich results in Search for more than a decade: the star ratings under a product listing, the event date in a search card, the recipe time in a featured snippet. The direction of travel is consistently towards more types, more properties, and more interactive presentations generated from structured data.

Google and Microsoft both confirmed publicly that they use Schema.org markup to inform generative AI features. ChatGPT has confirmed that structured data influences which products appear in its results. The machine-readable web isn't a coming development; it's the infrastructure already in use.

Why the deprecations matter more than the additions

When Google deprecated FAQ, Q&A, and Practice Problem rich results, the stated reason was low quality and widespread misuse. Publishers were adding FAQ schema to pages that contained no genuine question-and-answer content. The markup was being used to claim formatting space in search results, not to describe anything real.

This isn't a technical failure but a structural one. Schema.org has no mechanism to distinguish a publisher who genuinely runs a Q&A service from one who added FAQ markup to a landing page for ranking purposes. Both produce identical markup, and the vocabulary has no concept of assertion, authority, or verification.

Google's response was to withdraw the rich result entirely for most publishers. That's a blunt instrument, and it won't be the last time it's needed.

What Schema.org cannot tell you

Schema.org tells a machine what something is: a product, an event, a person, a review. It describes properties and relationships, and it does this well.

What Schema.org can't tell you is who made the assertion and whether you should believe it.

A Product schema block on a merchant page says the price is £49. Schema.org has no field for: was this price published by the merchant who owns the product? Was it injected by a third-party script? Has it been modified since publication? Is it a genuine merchant page or a lookalike designed to deceive?

None of these questions have answers in the Schema.org vocabulary. The vocabulary assumes that the relationship between the markup and the publisher is trustworthy by default. That assumption held when most publishers were businesses with reputations to protect. It's under strain now.

The lifecycle gap Schema.org leaves open

There's a second gap beside provenance. Schema.org describes content structure; it doesn't describe the document as a living artefact. A schema.org Article carries datePublished. It carries no successor pointer and no maintainer distinct from the author. It carries no validUntil. A BreadcrumbList carries position; it carries no expiry. A Product carries price; it carries no canonical URI pointing at the durable record of that product.

Once a machine has resolved the citation, fetched the page, and read the markup, Schema.org is silent on six signals an arriving agent needs:

  • When the page was published: some types have datePublished, most don't.
  • When it expires: no canonical Schema.org property.
  • Who authored it: present on some types, absent on others, never with a stable cross-type field.
  • Who maintains it now: distinct from the author; no Schema.org field for it.
  • What its canonical URI is: HTML carries <link rel="canonical">; Schema.org has no equivalent for the document inside its own vocabulary.
  • Whether it has been superseded: no canonical successor pointer.

MX carries every one of those as a first-class field every cog has: created, expires, originator (alias author), stewardship.steward (alias maintainer), canonicalUri, and status plus supersedes / supersededBy / replacedBy for supersession. These are the signals a machine needs to act on the page it just landed on, and Schema.org by itself doesn't carry them. Provenance answers can we believe it; lifecycle answers can we act on it now. Both gaps point at the same conclusion.

Why this matters when AI reads your markup

When an AI agent retrieves structured data to answer a question or populate a result, it makes a judgment about the content based on the markup. If the markup is wrong, injected, or fabricated, the AI answer reflects that. The error propagates downstream to every system that consumed it.

The volume of structured data in use is growing, precisely because AI systems have found it useful. The more useful structured data becomes as an input to AI, the more incentive exists to manipulate it. FAQ schema was deprecated because enough publishers gamed it. The same dynamic will appear in every high-value schema type as AI systems place increasing weight on structured data.

Structured data can be gamed; the question is whether there's a layer that lets a machine verify that a piece of it was published by the entity it claims to represent, at the time it claims, and hasn't been altered since.

The layer that is missing

MX and REGINALD are designed to be that layer.

MX makes content machine-readable: structured, typed, and described in terms machines can interpret without inference. Schema.org does the same, and MX extends it to the full content lifecycle, from first publication through every subsequent change.

REGINALD adds provenance. A content object signed through REGINALD holds a verifiable record of who published it, when, with what identity, and whether it has been modified. A machine reading a REGINALD-signed piece of markup can verify the assertion, not just read it.

The verification path inside REGINALD is also deterministic by design. The chain that proves a signed piece of markup is what its publisher published, unaltered, runs on a fixed set of cryptographic steps. There's no language model in the path, and no agent loop deciding what to attest. Two readers checking the same signed markup reach the same yes-or-no verdict, on separate machines, on different days, every time. That property is what makes the attestation worth something to a regulator, an auditor, or an AI agent acting on the result. A trust layer whose answers shift when a model is upgraded is a recommendation engine, not a registry.

Schema.org and REGINALD aren't in competition. Schema.org describes what content is. REGINALD attests that the description is genuine. The combination is what the machine-readable web needs as AI systems rely on structured data for decisions that carry real consequences: which product to recommend, which source to cite, which answer to trust.

The expansion of Schema.org is welcome, and each new type gives machines more to read, but the provenance gap grows with it. Without attestation, the growth in machine-readable content means more of it can't be verified. Schema.org wasn't designed to solve that problem; it needs a layer beneath it that can.