How Agents Discover Metadata: Carrier-Neutrality, Content Negotiation, and the Optional Cog

21 May 2026 · Tom Cranstoun · 8 min read

A software developer setting out to build a machine-experience pipeline often starts with a search for file extensions. They see the files in a Machine Experience repository, notice suffixes like .cog or .cog.md, and conclude that the pipeline requires these naming patterns. They assume that if they want their content to be recognized as a COG - the core structured data unit of the MX standard - they must rename all their files to use these markers.

This is a misunderstanding. A file ending in .js with a structured metadata block is a COG. A plain Markdown document without any special infix is a COG, provided it carries the structured metadata that machines require. The .cog suffix is a signpost for human developers. It helps people scanning a directory quickly identify which files are designed for machine consumption. For the automated pipelines, parser libraries, and artificial intelligence agents that process this content, the file extension is secondary. They rely on structural signatures, not file suffixes.

Understanding how this works requires looking at the core architecture of the Machine Experience standard. It depends on carrier neutrality, precise mapping from source frontmatter to output templates, and edge-level content negotiation.

The concept of carrier neutrality

The first principle to grasp is that MX is carrier-neutral. The standard is designed to ensure that structured metadata travels with the content, regardless of the container file format. A piece of technical documentation might start as a Markdown file on a developer's laptop, get built into an HTML page served on a website, and eventually be compiled into a PDF document sent to a client. In a traditional workflow, the metadata is stripped or fragmented at each transition. In an MX-compliant workflow, the metadata is preserved, adapted, and re-embedded in each new carrier.

This design prevents the metadata from being separated from the prose. The envelope change does not destroy the content's structured identity. The metadata payload is the message, and the carrier is merely the envelope. Whether that envelope is Markdown, HTML, PDF, or a JSON payload, the machine reader must be able to extract the same essential facts about authorship, versioning, audience, and licensing.

By remaining carrier-neutral, the MX standard avoids tying its capabilities to any single technology stack or file format. It works in Git repositories, web servers, content distribution networks, and document storage systems alike. The discovery mechanisms are designed to match this flexibility.

The optional .cog marker: humans vs machines

Why do we use the .cog extension if the standard is carrier-neutral? Suffixes like .cog.md and .cog are human signposts. They are markers designed to help a human editor navigate a mixed codebase. When an engineer opens a repository containing thousands of files, seeing .cog.md tells them immediately that this file carries formal metadata and is subject to the strict validation rules of the MX pipeline.

To a machine agent, however, the extension is minor. An AI agent crawling a website or a parser validating a local directory does not trust the file extension to tell it what a file is. The agent looks for the structural signature. In a Markdown carrier, this signature is a YAML frontmatter block that begins and ends with triple hyphens (---). In an HTML carrier, the signature is a specific meta tag in the document head: <meta name="mx:cog" content="...">.

If that tag or frontmatter block is present and carries the required namespace declarations, the file is a COG. The edge router and the parser treat it as one, even if the filename is simply article.html or helper.js. The name is for the human; the structure is for the machine.

How YAML metadata transfers to HTML

When a source document with YAML frontmatter is built into HTML for publication, the pipeline must transfer the metadata without losing structure. The CogNovaMX pipeline performs this transfer in three parallel channels, each designed for a different level of machine consumption.

The verbatim comment block. The most direct channel is the re-embedding of the original YAML block. Right after the magic-header meta tag in the HTML head, the builder places the entire YAML frontmatter block verbatim inside an HTML comment. This preserves the exact spacing, keys, and values of the source document. A machine reader that is optimized for YAML can extract this comment, parse the YAML payload, and obtain the full metadata graph without having to reconstruct it from scattered HTML tags. This preserves the original structure across the build boundary.

The flat meta tags. For simple crawlers and search engine crawlers that do not parse YAML comments, the pipeline projects the key fields into individual HTML meta tags. Keys in the mx namespace are mapped to tags like <meta name="mx:status" content="published"> and <meta name="mx:contentType" content="blog-post">. This supports fast, flat queries by agents that only read standard meta headers.

The Schema.org JSON-LD graph. For semantic web search engines and large language model parsers, the pipeline generates a structured JSON-LD block in the document head. This block defines a clean BlogPosting or CreativeWork node within a larger graph. It links the post's author (a Person node) to their specific profile page and attributes the publisher (an Organization node) using stable identifiers. The JSON-LD graph also carries the AI disclosure property (digitalSourceType) to make the content's origin clear to automated readers.

Content negotiation and agent discovery

How do agents discover these COG files when browsing the web? The answer lies in HTTP content negotiation. The edge workers that serve the mx.allabout.network site are configured to pay close attention to the Accept request header sent by the client.

When a human visitor clicks a link in their web browser, the browser requests the page with an Accept header that prioritizes text/html. The edge worker detects this, routes the request to the compiled HTML carrier, and serves the fully styled page. The human sees a clean, readable layout with typography, images, and navigation chrome.

When an AI agent or a developer's CLI tool requests the exact same URL, it can send an Accept header that specifies text/markdown or application/yaml. The edge worker detects this preference, bypasses the HTML rendering, and serves the raw Markdown source file with the YAML frontmatter intact. The agent gets the clean, lightweight, highly structured source directly, without any of the navigation links, headers, or footers that clutter traditional web scraping.

This approach achieves the goal of a single canonical URL serving both audiences perfectly. There is no need to maintain separate developer APIs or parallel subdomains. The resource URL is identical; the server adjusts the format based on who is asking.

Sitemaps, llms.txt, and structured entry points

Content negotiation works well if the agent already knows the URL. To help agents discover the URLs in the first place, the site maintains structured entry points.

The standard XML sitemap (sitemap.xml) provides a complete index of all public URLs on the domain. This helps general web crawlers find new and updated pages quickly. Alongside this, the site hosts an llms.txt file at the root of the domain. The llms.txt file is a markdown-formatted directory designed specifically for AI models and developer agents. It lists the main sections, provides a short summary for each URL, and directs agents to the most relevant resources without requiring them to crawl the entire site navigation.

Every time a new post is published, the site's build scripts automatically regenerate these discovery surfaces. The sitemap is refreshed to include the new canonical URL, the blog index is updated with a new link card, and the llms.txt file is sync-built to ensure the new page is represented. This ensures that a newly published COG is discoverable by automated agents within seconds of publication.

Guiding agents in the body: data attributes

Once an agent has discovered the URL and retrieved the HTML version, the MX standard provides guidance within the document body. A typical web page is filled with sidebars, related posts, newsletter sign-up forms, and site footers. An agent trying to extract the core argument of a post often has to use heuristics to separate the article from the surrounding chrome.

To solve this, the HTML body carries structural data attributes. The core article element is marked with data-agent-visible="true" and data-article-type="blog-post". This provides an unambiguous signal to the parsing engine. The agent does not have to guess which div contains the actual text; it can locate the element with the data-agent-visible attribute and extract the content directly.

These techniques combine to build a web that is as easy for machines to navigate and understand as it is for humans. The .cog suffix is an excellent tool for human team coordination, but the machine-readable web relies on a deeper, more robust foundation of structured, carrier-neutral data.