Index

Block the Machine, It Walks Around You

Earlier this week I asked my reading agent to catch me up on the news about building an "agentic enterprise." It went to fetch the lead article and came back with a 403 and a spinning "Just a moment." An article about machines that act, shut in a machine's face.

It did what a machine does next. It walked around the wall. That took less time than the wall did to build. When it got in, though, the article failed it a second time, in a way the wall had nothing to do with. Those two failures are the story.

This post is about why the wall doesn't work, the many ways past it, why getting read is only half of what an agent needs, and the cheaper thing to do instead. None of this is hacking. Every access method here is a public, documented way the web already hands content to machines. If a method felt like breaking in, I left it out. The whole point is that the content was meant to be read.

You Can't Block Machines, Only Choose Which Ones

A bot wall doesn't keep machines out. It sorts them. The polite machines identify themselves and obey the rules you publish, so they're the first ones it turns away. The determined machines render the page, rotate, and look like a browser, so they walk straight through. The wall keeps out the crawler that would have ranked you, the archive that would have preserved you, and the screen reader that would have read you aloud to a blind visitor. The scraper you were worried about is already inside.

How a Machine Gets In Anyway

When one door closes, the web has a dozen more. Here's the set it worked through, roughly in the order a machine tries them.

Ask again, the way a browser asks

  • A request with a normal User-Agent. Many "blocks" are just a default deny on empty or odd headers.
  • A real browser, or a headless one, the same engines developers test with. It runs the page's own JavaScript, including the challenge, and hands back the rendered text. This isn't an exploit, just the page loading the way it was built to load.
  • An agent that lives inside a real browser. Not a script dressed up as Chrome, an agent running in Chrome, on your machine, with your logged-in session. It renders the page, answers the challenge set for its own browser, and reads exactly the bytes you would. The wall has nothing to turn away, because the human's browser is right there around it. This is where agents are heading: not arriving from outside at all, but already running in the browser you opened. SLICC is one of these, a browser-native agent that runs inside your own session and acts on the page from there.
  • A render-to-text service. Give it a URL, get back clean text. That's how it finally got me the article.

Read the copy the site made easy on purpose

  • The reader view, the stripped-down text your browser already builds on request.
  • The print stylesheet. The printable version is often plain and wide open.
  • The mobile or AMP version, a second lighter copy of the same page.
  • The site's own API. The page you see is usually a thin shell over a clean data endpoint that hands the content out as structured text.

Read the copy the site already published for machines

  • The raw HTML under the JavaScript. The text a search engine needs is usually sitting right there in the markup.
  • Structured data: the headline, author, date, and body, each one tagged, so no machine has to guess.
  • The RSS or Atom feed. Many feeds carry the full text and never touch the wall.
  • The sitemap and robots.txt, where the site lists its own pages and tells crawlers what's fair game.
  • oEmbed, the endpoint that exists so other sites can embed yours.

Read the copy someone else is keeping

  • The Wayback Machine and other web archives. If the page lived for a day, a copy may outlive the wall.
  • Search-engine caches and result snippets. The crawler already read it; the snippet is the proof.
  • The syndicated versions: the same piece on the author's own site, a newsletter, a professional network, a mirror. Writers want reach, and reach means copies.
  • An assistant that already crawled it. Large models read most of the open web months before you put the wall up.

I stopped counting at a dozen. The wall stopped none of them. It only decided which route my agent took.

A Bad Trade

Set what the wall costs against what it buys.

It blocks your allies. A search crawler is a machine. An archive is a machine. A screen reader is a machine reading your page out loud to someone who can't see it. A link preview is a machine. A citation engine that would have named you as the source is a machine. The wall turns all of them away at once.

It taxes your humans. Every "prove you are human" box is friction a real person pays, on your site, to reach the thing you published to be read.

It doesn't stop the abuse you were worried about. Scraping at industrial scale routes around blanket walls. The wall raises the price of entry, which selects for the best-funded scraper and clears out everyone smaller and more honest.

It doesn't even work. See the list above.

Getting In Was Only Half the Problem

The second failure had nothing to do with the wall. Once it was through the wall and reading the article back to me, the numbers had no ground under them. A 62% efficiency gain in four months. Several million saved a year. Precise figures, unnamed companies, no source, no link.

That's worse than it sounds, because of what's reading now. A person skims an unsourced statistic and quietly discounts it. An agent does the opposite. It reads a confident number, finds nothing telling it not to, and passes it on as fact to the next person who asks. Plausible and unverified is exactly the fuel a hallucination runs on.

An agent needs to do two things with your page: read it, and check it. The wall denied the first. The missing sources denied the second. The article cleared neither bar. The agentic enterprise it described couldn't have used it.

Cooperate, on Both

The move that works is the opposite of the wall, and it answers both bars at once. Stop trying to stop the machine from reading, and tell it instead how to read you well and what of you it can trust.

Make it readable:

  • Put the content in the markup. Don't hide the text behind a script a visitor has to be trusted to run.
  • Publish structured data, so the machine reads facts instead of guessing at them.
  • Publish a clean text endpoint and an llms.txt, so an agent has a way in that doesn't fight it.
  • Publish a real sitemap and feed.

Make it trustworthy:

  • Carry metadata that travels with the file: what it is, who made it, where it lives, and how it may be reused.
  • Let your claims carry their source. A figure like "62% in four months" can travel with a signature and a citation the agent checks before repeating it, so it quotes an attested fact instead of a number lifted from a paragraph.

The first half is the Machine Experience: metadata that rides with the content and answers the machine's questions before it has to ask. The second half is what Reginald adds on top: a way for a claim to carry proof, so an agent cites what's verified, not what merely sounds confident. Readable, and trustworthy. A page built this way helps the humans too. It ranks better, it previews better, and it reads better in every assistive tool, because all of those are machines too.

Block Abuse, Not Machines

Cooperation isn't the same as leaving the place wide open. There's real abuse: credential stuffing, scraping that runs up your bill, traffic that knocks the site over. Block that. Rate-limit the expensive paths. Watch for the actions, not the species. Charge for bulk access, and publish the API that makes it cheap and accountable.

That's a scalpel. A blanket "no machines" wall is a club, and it lands on the wrong heads: the crawler, the archive, the screen reader, and the person who just wanted to read.

The Machine's Coming in Either Way

The machine's an audience now. It's the agent, the crawler, the validator, the archive, and the assistive tool, and it isn't going away because you put up a spinner. It wants two things from you: to read you, and to trust you.

Meet it with a clean copy it can read and a claim it can check, and you get something back: a ranking, a citation, a preview, an accurate answer with your name on it. Make it climb the wall and guess at your numbers instead, and you get nothing, while it reads you and repeats you anyway.

It reads you either way. Hand it a copy worth trusting.