Why AI cites Reddit more than your site
A figure has been doing the rounds: someone analysed millions of AI citations across the major answer engines, and Reddit came out on top by a wide margin. It is the kind of stat people share with a shrug, because everyone already half knew it. ChatGPT cites Reddit more than it cites anyone else.
It is worth pausing on, because the conclusion most people draw from it is the wrong one.
The finding is real, and it is everywhere
The exact number depends on whose study you read, and they do not agree on the decimal. A Semrush analysis of citations across five engines put Reddit around 40% of all references, ahead of Wikipedia and YouTube. The 5WPR AI Platform Citation Source Index found Wikipedia and Reddit together driving over a quarter of ChatGPT citations in the US, with the Wall Street Journal, the New York Times and Bloomberg absent from the top twenty entirely. Search Engine Land reported the same shape from a different dataset: Reddit, YouTube and LinkedIn at the top.
The studies disagree on the precise share and they were run on different samples in different months. What they agree on is the direction, and the direction is unambiguous. Reddit is the most-cited source on the agent web, and the masthead journalism brands are not.
When several independent studies, run on different data, point the same way, the direction is the finding. The decimal is noise.
Why it happens, and why it is not a compliment
The instinct is to read "most-cited" as "most trusted". It is not. An answer engine does not cite Reddit because Reddit is the most authoritative thing on the internet. It cites Reddit because Reddit is the most legible thing on the internet, at scale.
Think about what a Reddit thread actually is from a machine's point of view. A clear question at the top, stated in plain language. A set of answers underneath, ranked by other humans who voted them up or down. Each answer attributed to an author, stamped with a date, wrapped in consistent markup that looks identical across millions of pages. A score that acts as a confidence signal. No overlay, no hero carousel, no menu that never reaches the accessibility tree. The machine fetches the page and the structure hands it the answer.
Now think about a typical brand site answering the same question. The answer is there, but it is buried three scrolls down a marketing page, phrased as a benefit rather than a fact, with no date, no named author, no structure that says "this paragraph is the answer to that question". The machine fetches the page and has to guess. Faced with a guess against your page and a clean parse against a Reddit thread, it takes the thread.
Reddit did not win on authority. It won on structure. And structure is the one thing on that list you can actually build.
Citation volume is not authority, and that is the second half of the story
Here is the part the celebratory headlines skip. The same property that makes Reddit so citable, an enormous volume of structured, dated, attributed human text, says nothing at all about whether any given thread is correct. A confidently upvoted wrong answer is exactly as parseable as a right one. The machine cannot tell the difference from structure alone, so a high citation count is a measure of legibility, not truth.
This is the risk hiding underneath the stat. As answer engines lean harder on the most legible sources, they inherit those sources' errors, and they launder them through the authority of a clean citation. The reader sees "according to a discussion on Reddit" and reads it as settled. It is not settled. It is merely well-formed.
So there are two separate problems on the table, and they have two separate fixes:
- Being read at all. If your page is not legible to a machine, you are not in the running. Reddit is winning this by accident; you can win it on purpose.
- Being worth citing. Legibility gets you quoted. It does not make the quote true, and it does not prove you are the one who said it. That is a different signal, and a machine has to be able to check it separately.
Making your content citable on purpose
The first problem is the one MX was built for. Machine Experience is the practice of making a page legible to a machine the way UX makes it usable for a human. Where Reddit gets structure for free from its template, MX lets any publisher declare the same things explicitly: this is the question, this is the answer, this is who wrote it, this is when, this is the authority it rests on, this is what a machine is allowed to do with it.
It is not surface markup pasted over a thin body. It is the underlying meaning expressed in a form a machine can verify rather than infer. A page that carries it stops being a guess. The answer engine reads who wrote it, when, on what basis, and whether the claim inside is something the publisher actually holds, without scraping it out of prose and hoping.
That is how you compete with a Reddit thread. Not by being louder, but by being at least as legible, on a page you actually control, with your name attached.
Closing the trust gap Reddit cannot
The second problem, citation volume is not authority, is where the trust layer comes in, and it is the part Reddit structurally cannot solve for you.
Two signals do the work, and they answer different questions. Attestation answers "who published this claim, and when", with a signature a machine can verify rather than take on faith. It does not prove anyone authoritative agrees with you; it proves you said it, on the record, on a date. Corroboration answers the rest: a load-bearing claim points to an authoritative external source that independently states the same thing, so a machine can check the claim against something other than your own assertion of it.
Put those together and you get the thing a Reddit citation can never be: content that is readable, attributable, and corroborated. An answer engine can quote it and show its work. That combination is what turns a citation from "well-formed" into "worth trusting", and it is exactly the gap the most-cited-source stat leaves wide open.
What I would take from the stat
Reddit being the most-cited source on the agent web is not a story about Reddit. It is a demonstration, run across millions of citations, of what machines reward: structure, attribution, dates, and a clean parse, available in volume. Reddit stumbled into all four. The lesson is not to post your answers on Reddit. The lesson is that the things Reddit gets for free are things you can give your own pages on purpose, and that legibility alone is only half the job.
Get read by being legible. Get trusted by being attributable and corroborated. The publishers who do both will be the ones cited in the answers that actually hold up, and they will own the page it is cited from.
If you want to know what an answer engine sees when it reads your site today, that is an audit I can run.