Index

Why your AI agent gives you a different answer every time

You ask the AI to do the same thing on Monday and on Wednesday. You get two different results. Not wildly different - just different enough that you can’t quite trust it. The headings have moved. A field is missing. The tone has shifted. You wonder if you phrased it wrong.

You didn’t. The problem is that you wrote an instruction, and the AI read it as a suggestion.

The thing nobody tells you about AI agents

When you write a document that tells an AI what to do, a brief, a checklist, a process guide, the AI does not execute it. It interprets it. Every time. The model reads your words, forms an impression of what you probably meant, and acts on that impression. The impression is shaped by everything else in its context that day: the conversation so far, the examples it happens to recall, the ambient pressure of recent prompts.

This is why your agent feels like a clever new starter who keeps forgetting the house style. It isn’t forgetting. It is re-deciding, from scratch, on every run.

For creative work, that’s fine. You want a model to bring judgement to a draft. For anything you need to be the same on Monday and Wednesday, a published page, a billing record, a compliance check, interpretation is the enemy.

The category error

Most teams trying to make AI reliable are reaching for the wrong tool. They write longer prompts. They add more rules. They paste the brand guidelines into the system message. They are, in effect, trying to make a human document so detailed that even a machine cannot misread it.

It will still misread it. Prose is not a contract. Prose is an invitation to interpret. You cannot fix interpretation by adding more words to interpret.

And here is the part that catches most people out: this is true even when you give the AI a script. You might think a script is unambiguous: it’s code, it’s literal, it does what it says. But the AI doesn’t run the script. It reads the script. It looks at the source, decides what it thinks the script would do if it ran, and then describes that. The execution is imagined. The output is the model’s best guess at what the code means, not what the code does.

The same thing happens with checklists, runbooks, and process documents. The AI reads the steps and produces a plausible-sounding account of having performed them. Sometimes it really did perform them. Sometimes it skipped one and didn’t notice. Sometimes it invented a step that wasn’t there. You have no way to tell from the output, because the output is prose about the work, not the work itself.

The category error is treating the artefact as if its audience were singular. A document that says “validate the frontmatter, then publish” is written in a register designed for human judgement. A script that says validate() && publish() is written for an interpreter that actually runs it. Hand either of them to an LLM and you get the same thing: a confident narration of what it thinks happened.

From instruction to contract

The shift that makes AI agents reliable is small to describe and large in consequence. You stop writing instructions and start writing contracts. And, this is the part that matters, you stop asking the AI to be the thing that enforces them.

An instruction says do this thing. A contract says here is what done looks like, and here is how to check. An instruction lives in sentences. A contract lives in fields. An instruction can be paraphrased; a contract either matches or it doesn’t.

But a contract is only worth the paper it’s written on if something other than the AI is checking it. If you ask the AI both to do the work and to confirm it did the work, you are back where you started: a confident narration with nothing underneath. The check has to run somewhere the AI cannot talk its way around: a validator, a schema, a typed function, a test that passes or fails on its own terms. The AI proposes; something else disposes.

In practical terms, this means the parts of your document that need to run the same way every time stop being prose and start being structured data, and the steps that need to actually happen stop being scripts the AI reads and start being functions a runtime calls. A field called status with allowed values draft, ready, published is a contract a validator can enforce. A sentence that says “make sure the status is set correctly” is an instruction the AI will paraphrase. The first cannot drift. The second always will.

This is what Machine Experience calls separating the audiences. The same document can carry narrative for humans and structure for machines, but the machine-actionable parts are not buried in the prose, and they are not executed by the model that reads them. They are declared explicitly, validated independently, and only then accepted as done.

What this looks like in your business

You don’t need to learn YAML to act on this. You need to ask one question of every AI workflow you currently rely on: which parts of this need to be the same every time, and which parts benefit from fresh judgement?

The parts that need to be the same, the schema of an invoice, the required fields on a customer record, the allowed values in a dropdown, the steps in a regulated process, should not be written as instructions to an AI. They should be expressed as structure the AI is required to satisfy, with a check that runs before the work is accepted. That check is cheap. It is the difference between an agent that mostly works and an agent you can put your name to.

The parts that benefit from judgement, the wording of a customer reply, the framing of a recommendation, the tone of an internal note, are exactly where AI earns its place. Leave those in prose. Let the model interpret. That’s the work it’s good at.

The mistake is mixing the two and hoping for the best. The fix is to decide, deliberately, which is which.

Where this is going

The agencies and platforms that are quietly winning with AI right now are not the ones with the cleverest prompts. They are the ones who have understood that an AI agent is a runtime, and a runtime needs contracts. They are building artefacts that carry their own rules, what’s required, what’s allowed, what counts as done, so that any agent picking up the work has nothing to guess at.

The next twelve months will sort businesses into two groups. One will keep adding words to prompts and wondering why the output drifts. The other will start treating their machine-readable artefacts as first-class assets, designed for the audience that actually consumes them.

If you take one thing from this: the question is not how do I prompt the AI better. The question is what am I handing the AI, and is it a contract or a suggestion?