The provenance gap, and why Google keeps closing it the hard way

13 May 2026 · Tom Cranstoun · 6 min read

SEO, GEO and AEO are now standard items on every content team's checklist. Search engine optimization aims at Google's ranking systems, generative engine optimization aims at AI answer engines, and answer engine optimization aims at the citation slots inside those answers. Each discipline has its own playbook, its own tooling, and its own vendors selling volume.

The problem is not the disciplines. The problem is treating them as ingredients to add to a page without considering whether the page is worth ranking, quoting, or citing in the first place. Adding SEO, GEO and AEO to the mix without considering content quality is the mistake. Decorating a page with rich snippets, FAQ schema, and answer-ready bullets is not the same as creating a high-quality, fact-based resource. The markup describes the page; it does not validate it.

What happens when content quality is the afterthought

A pattern has become hard to ignore across recent industry analysis of content built with AI at volume.

The shape is consistent enough across industries to be the rule rather than the exception. A site starts publishing AI-assisted pages at volume, page count rises sharply, traffic follows for a few months, and then a recalibration arrives that takes most of the gain back, often dropping the baseline below where it started. The collapses are severe enough, and frequent enough, that many of the sites featured in vendor case studies have since deleted or redirected the very pages those case studies held up as wins. The pages doing the damage tend to be templated: products compared in pairs across an entire category, definition pages stamped out across a glossary, ranked lists where the publisher tops its own list, location pages for places the business does not actually serve. None of it is hidden tradecraft. Most of it is what current SEO, GEO and AEO advice recommends doing.

At the same time, ranking systems have been withdrawing presentation rewards from the structured-data types publishers used most aggressively to chase rankings. FAQ and Q&A markup are now deprecated as rich results. Schema.org keeps expanding on one side; Google keeps withdrawing rewards on the other. The contradiction is the point. FAQ markup was deprecated because enough publishers gamed it. The same dynamic will work through every high-value schema type as the weight placed on structured data grows.

The lesson is hard to avoid. Google can tell when a site is gaming the system, and it will not reward the pattern for long. Whatever short-term lift the playbook produces is followed by a recalibration that erases the gain.

The provenance gap

Underneath all three disciplines sits a problem that no amount of optimization will fix on its own. Schema markup can tell a machine what something is: a product, a person, a price, a review. It cannot tell that machine who made the assertion, when, or whether to trust it. As AI agents place greater weight on structured data in decisions that carry real consequences, the incentive to manipulate it grows in lockstep. That is the provenance gap, and it is widening as fast as the surface area grows.

Machine Experience, or MX, is an emerging standard designed to be the layer beneath structured data that closes this gap. The idea is straightforward: a web page should be a portable, self-describing document that carries its own metadata about origin, intent, and authorship. Format compliance and editorial honesty are different problems, and MX treats them as different problems. The standard rewards what a page actually says rather than how a page is dressed, and it makes both legible to the machines now reading alongside the humans.

What MX actually rewards

MX uses a readiness model with levels that build on each other. At the lowest level, a page is simply discoverable: a machine can find it through sitemaps and clean HTML. The next level up is what MX calls Citation readiness, and this is where the work begins. A page reaches Citation readiness when the facts on it are something the publisher actually holds, facts a machine could quote because they are real, specific, and traceable to the source. Levels above that introduce comparison, registration in public indexes, and third-party audit.

The point of the ladder is that a page cannot reach Citation readiness unless there is something real behind it. The format does not invent facts. It makes them legible when they exist. That means the work of climbing the ladder is the same work good publishers have always done: getting facts right, knowing your subject, writing things only you can write.

There is a useful design principle underneath all of this. Interfaces optimized for machines tend to improve human and accessibility outcomes too. The inverse also holds. Interfaces optimized for appearing machine-ready, with nothing underneath, fail for both audiences. A glossary page stamped out from a template does not help a human, because the same answer sits on the first ten results already. It does not help a machine either, because the machine has no way of checking where the claim came from.

MX predicts the failure pattern structurally rather than as a moral observation. Sites running these templated approaches are discoverable but nothing more. They cannot reach Citation readiness because there is no fact-level clarity behind the page. The facts were generated to fill a slot. The pages cite nothing because there is nothing to cite. When ranking systems accumulate enough signal that the pages are interchangeable across publishers, the ranking evaporates.

A diagnostic question

One test does most of the work. Could a competitor publish a near-identical version of this page tomorrow using the same prompt? If yes, the page exists for the index rather than for either reader. Pages that pass this test carry something specific. Pages that fail it carry nothing distinctive enough to be worth pointing at.

The bottom line

The packaging keeps changing: AI-first SEO, GEO programmes, AEO for citation slots. The pattern stays the same. Sites that come through each ranking cycle best are the ones that put quality, originality, and topical focus ahead of volume. Decorating a page with rich snippets is not the same as creating a high-quality, fact-based resource, and ranking systems are getting better at telling the two apart.

MX makes the alternative concrete. A readiness model that rewards fact-level clarity. A structured-content layer that carries provenance. A standard that lets machines verify what they are reading rather than guess. The work is not different in kind from what good publishers have always done. It is the same work, made legible to a wider audience: the machines now reading the web on behalf of the humans who used to do it themselves.

Where this is written down, and where it is debated

If the argument lands and you want to take it further, two places carry the rest of it. The MX book series is the long-form specification: MX: The Handbook for the framework and the day-to-day patterns, MX: The Protocols for the cog format and the agent-facing contracts, and MX: The Appendices for the field dictionary and recipes. The books are the place where the structural argument is written down once, in the form a serious team can adopt without having to reverse-engineer it from blog posts.

The Gathering is where the standard is debated, refined, and kept honest. It is the open community that owns the cog specification, reviews proposed extensions, and stops the format from drifting into any one vendor's interest. tg.community is the door.