Foundations

Posts on the principles that underpin machine-readable content - language, marks, and the details machines need to read without guessing.

Before the metadata layer and before the governance layer, content has to be readable. These posts examine the ground-level questions: what happens when a mark is stripped, when a machine meets a word whose meaning lives in its accents, when translation carries cultural assumptions the source language never declared.

The posts are short and specific. Each argues one point from a concrete case. The intended reader is anyone building content infrastructure who wants to understand what the machine actually needs.

Posts

Strip the Marks, Lose the Word

English speakers drop accents out of habit, and in English it rarely matters. In Vietnamese it deletes the word. When marks are stripped before a machine reads the text, no translation can recover them, and any provenance system that signs the wrong bytes breaks on exactly the languages that need it most.

3 June 2026

Orange With Pump: A Field Guide to Machine Translation Going Sideways

A British juice brand, sold in Germany, photographed through a phone's translate camera, promises Orange with Pump. Nobody ordered a pump. The joke is instructive: a machine is grounded in a culture, and it is not yours. It reads German through an English-trained memory, and where your cultural grounding would have started, its guessing runs out.

8 June 2026