Schema.org keeps growing. The provenance layer does not exist yet.

8 May 2026 · Tom Cranstoun · 4 min read

Google is expanding the types of structured data it uses to generate rich results in Search. New schema types support loyalty programs, detailed merchant listings, and interactive event formats. At the same time, Google deprecated seven types, including FAQ and Q&A, that publishers had been using to chase search rankings rather than to describe genuine structure.

Both movements are significant; the deprecations tell the more interesting story.

What is happening

Schema.org markup is the formal vocabulary that lets publishers describe their content in terms machines can read. A product has a price and a rating. An event has a date and a location. A person has a name and an affiliation. By adding this structured vocabulary to a web page, a publisher gives search engines, AI agents, and other software the raw material to understand what a page is about without having to infer it from prose.

Google has used Schema.org markup to generate rich results in Search for more than a decade: the star ratings under a product listing, the event date in a search card, the recipe time in a featured snippet. The direction of travel is consistently towards more types, more properties, and more interactive presentations generated from structured data.

Google and Microsoft both confirmed publicly that they use Schema.org markup to inform generative AI features. ChatGPT has confirmed that structured data influences which products appear in its results. The machine-readable web is not a coming development; it is the infrastructure already in use.

Why the deprecations matter more than the additions

When Google deprecated FAQ, Q&A, and Practice Problem rich results, the stated reason was low quality and widespread misuse. Publishers were adding FAQ schema to pages that contained no genuine question-and-answer content. The markup was being used to claim formatting space in search results, not to describe anything real.

This is not a technical failure but a structural one. Schema.org has no mechanism to distinguish a publisher who genuinely runs a Q&A service from one who added FAQ markup to a landing page for ranking purposes. Both produce identical markup, and the vocabulary has no concept of assertion, authority, or verification.

Google's response was to withdraw the rich result entirely for most publishers. That is a blunt instrument, and it will not be the last time it is needed.

What Schema.org cannot tell you

Schema.org tells a machine what something is: a product, an event, a person, a review. It describes properties and relationships, and it does this well.

What Schema.org cannot tell you is who made the assertion, when, and whether you should believe it.

A Product schema block on a merchant page says the price is £49. Schema.org has no field for: was this price published by the merchant who owns the product? Was it injected by a third-party script? Has it been modified since publication? Is the page a genuine merchant page or a lookalike designed to deceive?

None of these questions have answers in the Schema.org vocabulary. The vocabulary assumes that the relationship between the markup and the publisher is trustworthy by default. That assumption held when most publishers were organizations with reputations to protect. It is under strain now.

Why this matters when AI reads your markup

When an AI agent retrieves structured data to answer a question or populate a result, it makes a judgment about the content based on the markup. If the markup is wrong, injected, or fabricated, the AI answer is wrong. The error propagates downstream to every system that consumed it.

The volume of structured data in use is growing, precisely because AI systems have found it useful. The more useful structured data becomes as an input to AI, the more incentive exists to manipulate it. FAQ schema was deprecated because enough publishers gamed it. The same dynamic will appear in every high-value schema type as AI systems place increasing weight on structured data.

Structured data can be gamed; the question is whether there is a layer that lets a machine verify whether a piece of structured data was published by the entity it claims to represent, at the time it claims, and has not been altered since.

The layer that is missing

MX and REGINALD are designed to be that layer.

MX makes content machine-readable: structured, labeled, typed, and described in terms that machines can interpret without inference. This is what Schema.org does, and MX extends it to the full content lifecycle, from first publication through every subsequent change.

REGINALD adds provenance. A content object signed through REGINALD carries a verifiable record of who published it, when, with what identity, and whether it has been modified. A machine reading a REGINALD-signed piece of markup can verify the assertion, not just read it.

Schema.org and REGINALD are not in competition. Schema.org describes what content is. REGINALD attests that the description is genuine. The combination is what the machine-readable web needs as AI systems rely on structured data for decisions that carry real consequences: which product to recommend, which source to cite, which answer to trust.

The expansion of Schema.org is welcome, and each new type is more surface area for machines to read, but the provenance gap grows with it. Without attestation, more machine-readable content means more machine-readable content that cannot be verified. Schema.org was not designed to solve that problem; it needs a layer beneath it that can.