Appendix L: Proposed AI Metadata Patterns

MX-Protocols

Tom Cranstoun

January 2026

Appendix L: Proposed AI Metadata Patterns

A formal proposal document for experimental AI metadata patterns that extend existing web standards.

Status and Classification

Document Status: Experimental Proposal — Not Yet Standardised

Maturity Level: Forward-compatible proposals that do not break if agents do not recognise them

This appendix consolidates all proposed and experimental patterns mentioned throughout “MX: The Protocols”. These patterns follow established conventions (like robots meta tags or viewport meta tags) and represent logical extensions that may standardise as the AI agent ecosystem matures.

Important: These are NOT established standards. They are proposals based on production implementations and logical extensions of existing patterns.

Relationship to Established Standards

The standards hierarchy is absolute. Established web standards come first. MX fills gaps that standards do not yet cover. MX never duplicates what standards already provide.

Implementation order:

Semantic HTML (established) — Use <main>, <nav>, <article> always
Schema.org JSON-LD (established) — Primary structured data method
ARIA attributes (established) — Critical for accessibility
HTTP headers (established) — Cache-Control, Content-Type, status codes
robots.txt / sitemap.xml (established) — Discovery and crawl guidance
llms.txt (emerging) — Early adoption phase, gaining traction
mx: meta tags (proposed) — Fill gaps not covered by the above
data-agent-visible (proposed) — Semantic marker for agent-only metadata
Common data attributes (proposed) — Explicit state management patterns
Pandoc YAML frontmatter (established) — Universal markdown metadata standard

If a standard already covers the need, use the standard. MX tags exist only where no established standard provides the same capability.

See Appendix D for the comprehensive guide to all patterns (established + proposed).

Pattern 1: MX Framework Meta Tag Namespace

Status

Proposed Pattern — Not yet standardised, forward-compatible

Rationale

Page-specific AI agent guidance needs to override site-wide defaults from llms.txt. Just as robots meta tags override robots.txt for specific pages, AI meta tags provide page-level control over agent behaviour.

Why meta tags?

Established pattern (robots, viewport, Open Graph all use meta tags)
Page-specific overrides for site-wide policies
Machine-readable without parsing content
Browser-agnostic (works in served HTML)

Part 1: MX Operating System (MX OS) Philosophy

What is MX OS?

The MX documentation is the MX Operating System (MX OS). When we document patterns here, we define how Machine Experience works.

MX OS is:

Documentation that specifies behavior
Patterns that practitioners follow
Standards that machines implement
A living system that evolves through practice

Key principle: Documentation as specification. By documenting how MX should work, we create the operating system that defines machine experience.

How MX OS Evolves

Version-controlled principles — All changes tracked in git history
LEARNINGS.md captures failures — Document what went wrong and how to prevent it
Community contributions — Both human and machine contributors
Evidence trumps theory — Real-world implementation guides evolution
No principle is sacred — If practice proves a principle wrong, we change it

For detailed documentation of how MX OS is built collaboratively, see Appendix M: Building the MX Operating System.

Part 2: MX Namespace Architecture

Overview

MX Framework uses a hierarchical namespace system to organize machine-readable metadata. This namespace architecture is documented here as part of the MX Operating System.

Namespace Hierarchy

Top-level namespace: mx:

Sub-namespaces:

mx.ai: — Machine-readable metadata (agent behavior, runbooks, content editability)
mx.co: — Content operations metadata (workflow, publishing, lifecycle)
mx.ho: — Hosting metadata (deployment, caching, infrastructure)

Example YAML:

mx:
  contentType: "specification"
  runbook: "Focus on technical accuracy"
  ai:
    aiEditable: cautious
    preferredAccess: html
  co:
    workflow: draft
    reviewRequired: true
  ho:
    cacheStrategy: aggressive
    cdn: cloudflare

ASCII Diagram of Namespace Structure

mx: (top-level namespace)
├── mx.ai: (AI-specific)
│   ├── editable
│   ├── preferredAccess
│   └── runbook
├── mx.co: (content operations)
│   ├── workflow
│   ├── contentType
│   └── reviewRequired
└── mx.ho: (hosting)
    ├── cacheStrategy
    └── cdn

HTML Meta Tags: Colon Prefix Pattern

In HTML, we use the mx: colon prefix (matching established conventions):

<meta name="mx:content-policy" content="extract-with-attribution">
<meta name="mx:attribution" content="required">

Why colon prefix?

HTML meta tags use colon-delimited namespaces as an established convention. The mx: prefix follows the same pattern as other widely adopted meta tag namespaces:

twitter: for Twitter Cards
og: for Open Graph
mx: for Machine Experience

Framework Identity

Like twitter: and og:, the mx: prefix:

Establishes MX brand and presence
Aids discoverability: developers search “mx meta tags” and find MX Framework
Aligns with MX namespace architecture: flat HTML prefix maps to nested YAML structure
Designed for MX practitioners: MX-Ready websites built by MX community

Extension Pattern

The namespace architecture is extensible. Future namespaces might include:

mx.sec: — Security metadata
mx.perf: — Performance optimization hints
mx.a11y: — Accessibility enhancements beyond WCAG

Guidelines for extension:

New namespaces should serve distinct, non-overlapping purposes
Follow camelCase naming convention for attributes
Document in this appendix before widespread use
Community discussion required for new top-level namespaces

Part 3: MX Attributes by Namespace

This section consolidates MX attributes organized by namespace. For complete Registry with all attributes, see mx-canon/mx-maxine-lives/registers/mx-attributes-registry.md (deprecated - refer to this appendix).

3.1 mx.ai: AI-Specific Metadata

Attributes that control AI agent behavior and content interpretation.

runbook

Type: string
Purpose: Instructions for AI agents on how to interpret or handle content
Example: mx: { runbook: "This is copyrighted material. No part may be reproduced without permission." }

editable

Type: enum (strict, cautious, flexible)
Purpose: Indicates how freely AI agents may edit or adapt content
Example: mx: { ai: { editable: cautious } }

preferredAccess

Type: enum (html, api, both)
Purpose: How agents should access content
Example: mx: { ai: { preferredAccess: html } }

deliverable

Type: string
Purpose: Instructions for generating output based on this content
Example: mx: { ai: { deliverable: "Generate slide deck from this content" } }

3.2 mx.co: Content Operations Metadata

Attributes for content workflow, lifecycle, and publishing.

contentType

Type: string
Purpose: Classification of content type
Example: mx: { contentType: "specification" }
Values: specification, tutorial, reference, guide, article

workflow

Type: enum (draft, review, published, archived)
Purpose: Current state in content lifecycle
Example: mx: { co: { workflow: draft } }

reviewRequired

Type: boolean
Purpose: Whether content requires review before publication
Example: mx: { co: { reviewRequired: true } }

publishingState

Type: string
Purpose: Detailed publishing status
Example: mx: { co: { publishingState: "pending-approval" } }

3.3 mx.ho: Hosting Metadata

Attributes for deployment, caching, and infrastructure.

cacheStrategy

Type: enum (aggressive, moderate, minimal, none)
Purpose: How aggressively to cache content
Example: mx: { ho: { cacheStrategy: aggressive } }

cdn

Type: string
Purpose: CDN provider or configuration
Example: mx: { ho: { cdn: "cloudflare" } }

deploymentTarget

Type: string
Purpose: Target deployment environment
Example: mx: { ho: { deploymentTarget: "production" } }

3.4 Cross-Namespace Attributes

Some attributes work across multiple namespaces or don’t fit neatly into one category.

All attributes follow:

Namespace: Nested under mx: key
CamelCase: Multi-word attributes use camelCase
No hyphens: Never use kebab-case
Consistent: Follow MX Code Metadata Specification

Part 5: JSON-LD Structured Data

Integration with Schema.org

MX Framework recommends Schema.org JSON-LD as the primary method for structured data. This complements (not replaces) HTML meta tags.

When to Use JSON-LD vs HTML Meta Tags

Use JSON-LD for:

Rich structured data (BlogPosting, Article, Product, Event)
Data that search engines and AI agents should extract
Complex nested data structures
Organization and author information

Use HTML meta tags (mx-) for:

Page-specific agent behavior overrides
Content policies and permissions
Freshness indicators
Access preferences

JSON-LD Format Decision

Use JSON-LD only - do not combine with microdata or RDFa.

Rationale:

Google recommends JSON-LD as primary format
Cleaner separation of content and metadata
Easier to maintain and validate
Better tool support

BlogPosting Example

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "BlogPosting",
  "headline": "Understanding MX Metadata Patterns",
  "description": "A comprehensive guide to machine-readable metadata",
  "datePublished": "2026-01-22",
  "dateModified": "2026-01-22",
  "author": {
    "@type": "Person",
    "name": "Tom Cranstoun",
    "url": "https://allabout.network"
  },
  "publisher": {
    "@type": "Organization",
    "name": "Digital Domain Technologies Ltd",
    "url": "https://ddt.technology"
  },
  "mainEntityOfPage": {
    "@type": "WebPage",
    "@id": "https://mx.allabout.network/blog/metadata-patterns.html"
  },
  "articleSection": "Machine Experience",
  "keywords": ["metadata", "machine-experience", "mx", "structured-data"],
  "wordCount": 4235,
  "inLanguage": "en-GB"
}
</script>

Article vs BlogPosting

BlogPosting: Personal or editorial blog content
Article: News articles or authoritative content
NewsArticle: Time-sensitive news reporting

Choose the most specific type that applies.

Required vs Recommended Properties

Required:

@context and @type
headline
datePublished
author

Recommended:

description
dateModified
publisher
mainEntityOfPage
keywords
wordCount

Pattern Specifications

Use Cases

Product Pages: Specify API endpoints for current product
News Articles: Indicate content freshness requirements
Documentation: Allow full extraction vs summary-only
Internal Pages: Override public access policies

Proposed Meta Tags

mx:preferred-access

Deprecated. Do not implement this tag.

Previously proposed to indicate how agents should access content.

Why deprecated: If a page is served as HTML, agents access it as HTML. If an API exists, it is discoverable via <link rel="api" href="..."> or documented in llms.txt. The tag restates what the delivery mechanism already communicates. Pages that serve HTML do not need a meta tag confirming that they serve HTML.

If you have an API endpoint: Use <link rel="api" href="/api/v1/products"> instead. Link elements are the standard mechanism for declaring related resources.

mx:content-policy

Active. What agents are permitted to do with content.

Values:

summaries-allowed — Can create summaries
full-extraction-allowed — Can extract complete content
extract-with-attribution — Can extract with attribution required
restricted — Contact required

Example:

<meta name="mx:content-policy" content="extract-with-attribution">

Rationale: More granular than robots.txt noindex. Allows summaries whilst restricting full extraction.

mx:freshness

Deprecated. Do not implement this tag.

Previously proposed to indicate how often content changes.

Why deprecated: HTTP Cache-Control headers already communicate cache duration to all clients, including AI agents. Schema.org dateModified in JSON-LD tells agents when content last changed. Adding a meta tag that restates this information creates a maintenance burden — when the HTTP headers say one thing and the meta tag says another, agents must decide which to trust. The HTTP header is the canonical source. Use it.

mx:structured-data

Deprecated. Do not implement this tag.

Previously proposed to indicate where to find structured data.

Why deprecated: The JSON-LD <script type="application/ld+json"> block is self-evident. Any agent capable of parsing structured data already knows to look for this standard element — it is defined by the JSON-LD specification and universally supported. Adding a meta tag that says “there is JSON-LD on this page” when the JSON-LD is right there on the page is pure noise. It would be like adding a meta tag that says “this page contains HTML.”

mx:attribution

Active. Attribution requirements for content.

Values: required, requested, not-required

Example:

<meta name="mx:attribution" content="required">

Rationale: Explicit statement of attribution expectations, ensuring consistent attribution across all AI-generated content that references this material.

mx:jurisdiction-restriction

Indicates content was created, published, or ingested from a jurisdiction with content restrictions, allowing agents to understand potential legal and content limitations.

Values:

ISO 3166-1 alpha-2 country codes: CN (China), RU (Russia), IR (Iran), KP (North Korea), etc.
EU member states with GDPR: EU (general), or specific codes like DE (Germany), FR (France)
Or none if no jurisdictional restrictions apply

Attributes:

content: Jurisdiction code (required)
reason: Brief explanation of restriction type (optional but recommended)

Example:

<meta name="mx:jurisdiction-restriction" content="CN" reason="Content sourced from jurisdiction with government content controls">

<meta name="mx:jurisdiction-restriction" content="EU" reason="GDPR right-to-be-forgotten applies to training data">

<meta name="mx:jurisdiction-restriction" content="RU" reason="Content subject to Russian information restrictions">

<meta name="mx:jurisdiction-restriction" content="none">

Rationale: When LLMs ingest training data from restricted jurisdictions, this meta tag signals potential legal constraints that may persist when the model operates in unrestricted jurisdictions. Content creators could use robots.txt directives or the noindex meta tag to prevent AI ingestion entirely, but this is an all-or-nothing approach that excludes content from all search engines, all AI agents, and all automated discovery mechanisms. The mx-jurisdiction-restriction meta tag offers a more nuanced solution: content remains discoverable and accessible whilst signaling jurisdictional constraints that might affect how agents use it. Helps agents:

Understand jurisdictional origin of training data
Flag content that may be subject to GDPR “right to be forgotten”
Identify material from jurisdictions with content controls (China, Russia, Iran)
Determine whether jurisdictional restrictions apply to model outputs
Assess legal risk when using information from restricted sources

Use Cases:

GDPR Compliance: EU-sourced content signals that right-to-be-forgotten requests may apply
Restricted Jurisdiction Content: China/Russia-sourced material may be subject to home jurisdiction controls
Legal Disclosure: Agents can warn users when information comes from jurisdictionally-restricted sources
Regulatory Compliance: Helps AI platforms document training data provenance

Related: See Chapter 7 “Data Ingestion in Restricted Jurisdictions” section for detailed legal and practical implications.

llms-txt Reference

Points to site-wide llms.txt file.

Example:

<meta name="llms-txt" content="/llms.txt">

Rationale: Helps agents discover llms.txt when not at standard location.

Complete Implementation Example

Three of the tags described above — mx-preferred-access, mx-freshness, and mx-structured-data — are unnecessary because they duplicate information already available through HTTP headers, Schema.org dateModified, and the self-evident presence of JSON-LD blocks. See the individual tag entries above for rationale. The example below includes only tags that contribute unique information.

<head>
  <title>Wireless Headphones — £149.99</title>

  <!-- MX meta tags (only non-duplicative tags) -->
  <meta name="mx:content-policy" content="summaries-allowed, full-extraction-allowed">
  <meta name="mx:attribution" content="required" text="Source: Example Store, https://example.com">
  <meta name="mx:jurisdiction-restriction" content="none">
  <link rel="llms-txt" href="/llms.txt">

  <!-- Established standards -->
  <link rel="canonical" href="https://example.com/products/headphones">

  <!-- Schema.org structured data -->
  <script type="application/ld+json">
  {
    "@context": "https://schema.org",
    "@type": "Product",
    "name": "Wireless Headphones",
    "dateModified": "2026-03-04",
    "offers": {
      "@type": "Offer",
      "price": "149.99",
      "priceCurrency": "GBP"
    }
  }
  </script>
</head>

Forward Compatibility

If agents don’t recognise these tags: They ignore them harmlessly. No breakage.

If agents do recognise these tags: They get helpful hints about content access and usage.

Progressive enhancement: Sites benefit from agent support without requiring it.

Cross-References

Mentioned in: Chapter 12 (Technical Advice)
Implemented in: Appendix D examples (lines 1556-1568)
Used in: agent-friendly-starter-kit/good/index.html
Enhanced by: scripts/enhance-appendix-html.js (lines 40-47)

Pattern 2: data-agent-visible Attribute

Status (data-agent-visible)

Proposed Pattern — Not yet standardised, experimental

Rationale (data-agent-visible)

E-commerce sites need to provide machine-readable instructions to AI agents without cluttering the human interface. Purchase flows require prerequisites (authentication, payment method, shipping address) that agents need to verify before attempting transactions.

Why data-agent-visible?

Follows data-* attribute convention (custom data attributes)
Semantic marker agents can search for
Hidden from humans with CSS (display: none)
Visible in DOM for all agent types (CLI, browser, server-based)

Use Cases (data-agent-visible)

Purchase Prerequisites: Tell agents what must be configured before checkout
API Documentation: Provide machine-readable endpoint details
Multi-Step Workflows: Explain sequence and requirements
Error Recovery: Hidden instructions for handling failures

Implementation Pattern

<div class="agent-metadata" data-agent-visible="true" class="visually-hidden">
  <h2>Purchase Information</h2>
  <dl>
    <dt>Action</dt>
    <dd>POST to /cart/add</dd>

    <dt>Required parameters</dt>
    <dd>product_id=WH-1000, quantity (1-23)</dd>

    <dt>Prerequisites</dt>
    <dd>
      <ul>
        <li>Authentication: Required (status: <span id="auth-status">authenticated</span>)</li>
        <li>Payment method: Required (status: <span id="payment-status">configured</span>)</li>
        <li>Shipping address: Required (status: <span id="shipping-status">set</span>)</li>
      </ul>
    </dd>

    <dt>Expected response</dt>
    <dd>Success: 303 redirect to /cart/added | Error: 400 with JSON details</dd>
  </dl>
</div>

JavaScript updates status spans:

// Update prerequisite status based on actual session state
document.getElementById('auth-status').textContent =
  user.authenticated ? 'authenticated' : 'not authenticated';
document.getElementById('payment-status').textContent =
  user.hasPaymentMethod ? 'configured' : 'not configured';
document.getElementById('shipping-status').textContent =
  user.hasShippingAddress ? 'set' : 'not set';

Why This Works

For humans: Hidden with display: none, doesn’t clutter interface

For CLI agents: Visible in served HTML before JavaScript execution

For browser agents: Visible after JavaScript updates status spans

For server-based agents: Visible in HTML fetch, can parse prerequisites

Alternative Approaches Considered

Microformats: Too rigid, doesn’t support custom workflows
Schema.org actions: Complex, requires extensive markup
ARIA live regions: Designed for screen readers, not agents
Comments: Not guaranteed to be preserved in DOM

Why data-agent-visible wins: Simple, flexible, follows established data-* convention.

Adoption Considerations

Adopt now if:

Running e-commerce site with agent-mediated purchases
Need to provide hidden API documentation
Want to reduce agent errors from missing prerequisites

Wait if:

Static content site with no transactions
No agent traffic yet
Prefer to wait for standardisation

Forward Compatibility (data-agent-visible)

If agents don’t recognise attribute: They might still find the hidden content by parsing all hidden divs (low reliability but possible).

If agents do recognise attribute: They search specifically for [data-agent-visible="true"] and parse structured prerequisites.

Progressive enhancement: Works better with agent support, but doesn’t break without it.

Cross-References (data-agent-visible)

Documented in: Appendix D (lines 1294-1326)
Mentioned in: Chapter 11 (agent purchase instructions)
Not yet implemented in: Most code examples (opportunity for addition)

Pattern 3: Common Data Attributes

Status (Common Data Attributes)

Proposed Pattern — Not yet standardised, emerging convention

Rationale (Common Data Attributes)

AI agents need explicit state information to understand dynamic interfaces. Modern web applications use JavaScript to change UI state (loading, validation, errors), but these changes are invisible to agents unless explicitly marked in the DOM.

Why data attributes?

Standardised HTML5 convention (data-* for custom data)
Machine-readable without parsing visual content
Visible in both served HTML and rendered DOM
Doesn’t interfere with CSS classes or ARIA attributes
Allows consistent patterns across different sites

Use Cases (Common Data Attributes)

State Management: Loading states, error states, success states
Form Validation: Field validity, form completion status
E-commerce: Product IDs, pricing, inventory, cart state
Pagination: Current page, total pages, sort order
Authentication: Login status, user roles
Multi-step Workflows: Current step, total steps, step validity

Proposed Data Attributes by Category

State Management

Attribute	Purpose	Example Values
data-state	Current state of element	loading, loaded, error, empty, incomplete, complete
data-validation-state	Form field validity	valid, invalid, pending
data-authenticated	Login status	true, false
data-error-code	Error identifier	PAYMENT_DECLINED, VALIDATION_ERROR, OUT_OF_STOCK

Example:

<form action="/checkout" method="POST"
      data-state="incomplete"
      data-errors="2">

  <input type="email"
         id="email"
         name="email"
         aria-invalid="true"
         data-validation-state="invalid">

  <button type="submit"
          disabled
          data-disabled-reason="2 fields incomplete">
    Submit (fix 2 errors first)
  </button>
</form>

Rationale: Agents can check form state before attempting submission, reducing error rates.

E-commerce Attributes

Attribute	Purpose	Example Values
data-product-id	Product identifier	WH-1000, SKU-12345, product-789
data-price	Numeric price	149.99, 29.50, 1299.00
data-currency	Currency code (ISO 4217)	GBP, USD, EUR, JPY
data-quantity	Item count	1, 23, 100
data-in-stock	Availability	true, false
data-item-count	Cart item count	0, 3, 12
data-subtotal	Cart subtotal	279.98
data-vat	VAT amount	46.66
data-total	Total price	279.98
data-checkout-ready	Can proceed to checkout	true, false

Example:

<article class="product"
         data-product-id="WH-1000"
         data-in-stock="true"
         data-quantity="23">
  <h1>Wireless Headphones</h1>
  <div class="price"
       data-price="149.99"
       data-currency="GBP">
    <span class="currency">£</span>
    <span class="amount">149.99</span>
  </div>
  <p class="stock"
     data-in-stock="true"
     data-quantity="23">
    <strong>In stock</strong> (23 available)
  </p>
</article>

<div id="shopping-cart"
     data-item-count="2"
     data-subtotal="279.98"
     data-vat="46.66"
     data-total="279.98"
     data-currency="GBP">
  <h1>Your basket (2 items)</h1>
  <!-- Cart items -->
  <a href="/checkout"
     data-checkout-ready="true">
    Proceed to Checkout
  </a>
</div>

Rationale: Agents can verify product availability, pricing, and cart state before attempting purchase operations.

Pagination and Sorting

Attribute	Purpose	Example Values
data-page	Current page number	1, 2, 3, 24
data-total-pages	Total pages	24, 100
data-total-results	Total result count	342, 1250
data-per-page	Results per page	10, 20, 50
data-sort	Current sort order	relevance, price-asc, price-desc, date-desc
data-sort-column	Sortable column	price, name, date, rating
data-sort-direction	Sort direction	asc, desc

Example:

<div class="pagination"
     data-page="3"
     data-total-pages="24"
     data-total-results="342"
     data-per-page="15">
  <a href="?page=2" data-page="2">Previous</a>
  <span class="current" data-page="3">3</span>
  <a href="?page=4" data-page="4">Next</a>
</div>

<table data-sortable="true">
  <thead>
    <tr>
      <th data-sort-column="name"
          data-sort-direction="asc">
        Product Name ↑
      </th>
      <th data-sort-column="price"
          data-sortable="true">
        Price
      </th>
    </tr>
  </thead>
</table>

Rationale: Agents can navigate paginated results and understand sort order without parsing visual indicators.

Multi-step Workflows

Attribute	Purpose	Example Values
data-step	Current step number	1, 2, 3, 4
data-total-steps	Total steps	4, 5, 7
data-step-status	Step completion status	pending, current, completed, error

Example:

<div class="wizard"
     data-step="2"
     data-total-steps="4">

  <ol class="steps">
    <li data-step="1" data-step-status="completed">
      Account Details
    </li>
    <li data-step="2" data-step-status="current">
      Shipping Address
    </li>
    <li data-step="3" data-step-status="pending">
      Payment
    </li>
    <li data-step="4" data-step-status="pending">
      Review
    </li>
  </ol>

  <!-- Step 2 content -->
</div>

Rationale: Agents can track progress through multi-step forms and understand completion requirements.

Button and Action States

Attribute	Purpose	Example Values
data-disabled-reason	Why button is disabled	“2 fields incomplete”, “Out of stock”, “Authentication required”
data-action	Action type	submit, cancel, delete, purchase, navigate

Example:

<button type="submit"
        disabled
        aria-disabled="true"
        data-disabled-reason="3 fields incomplete">
  Submit (fix 3 errors first)
</button>

<button type="button"
        data-action="delete"
        data-product-id="WH-1000">
  Remove from cart
</button>

Rationale: Agents understand why buttons are disabled and what action buttons perform.

Implementation Guidelines

Consistency is critical:

Use the same attribute names across your entire site
Use consistent values (e.g., always “true”/“false”, not “yes”/“no” or “1”/“0”)
Keep values simple (lowercase, hyphen-separated for multi-word values)
Always include currency with prices (data-currency=“GBP”)
Use ISO codes for currency (ISO 4217), language (ISO 639), country (ISO 3166)

Good patterns:

<!-- Consistent boolean values -->
<div data-in-stock="true">    <!-- ✓ Good -->
<div data-in-stock="false">   <!-- ✓ Good -->

<!-- Consistent state values -->
<form data-state="incomplete">   <!-- ✓ Good -->
<form data-state="complete">     <!-- ✓ Good -->

<!-- Always pair price with currency -->
<span data-price="149.99" data-currency="GBP">£149.99</span>  <!-- ✓ Good -->

Avoid these patterns:

<!-- Inconsistent boolean representations -->
<div data-in-stock="yes">     <!-- ✗ Bad -->
<div data-in-stock="1">       <!-- ✗ Bad -->
<div data-in-stock="Yes">     <!-- ✗ Bad (inconsistent case) -->

<!-- Missing currency -->
<span data-price="149.99">£149.99</span>  <!-- ✗ Bad (currency implied, not explicit) -->

<!-- Verbose values -->
<form data-state="not yet completed">  <!-- ✗ Bad (use "incomplete") -->

Forward Compatibility (Common Data Attributes)

If agents don’t recognise these attributes: They can still parse visible content, but may misinterpret dynamic states.

If agents do recognise these attributes: They get explicit, unambiguous state information without parsing visual content.

Progressive enhancement: Works better with agent support, essential for dynamic interfaces.

Adoption Considerations (Common Data Attributes)

Adopt now if:

Building dynamic interfaces with JavaScript state changes
Running e-commerce site with agent traffic
Using multi-step forms or wizards
Need to reduce agent errors from stale state information

Wait if:

Static content site with no dynamic behaviour
No agent traffic yet
Prefer to wait for industry consensus on attribute names

Relationship to Established Patterns

These data attributes extend established conventions:

HTML5 data-* attributes (established) — Custom data storage mechanism
ARIA state attributes (established) — Complement, don’t replace (use aria-invalid AND data-validation-state)
Microdata attributes (established) — Different purpose (structured data vs state management)

Critical distinction: Data attributes describe current state (dynamic), while microdata describes semantic meaning (static).

Cross-References (Common Data Attributes)

Documented in: Appendix D (lines 119-133, Common Data Attributes table)
Implemented in: All e-commerce examples (product-page.html, shopping-cart.html)
Implemented in: All form examples (validation-form.html, disabled-button.html)
Used throughout: Demo site pages (checkout, search, pagination examples)

Pattern 4: Pandoc YAML Frontmatter for Markdown Metadata

Status (Pandoc YAML Frontmatter)

Established Standard — Universal markdown frontmatter supported by Pandoc, Hugo, Jekyll, Gatsby, Quarto, and all major static site generators

Rationale (Pandoc YAML Frontmatter)

Markdown converters (like converturltomd.com) strip critical metadata when converting HTML to markdown. Agents lose JSON-LD structured data, HTML meta tags, and Schema.org markup - exactly the signals they need for accurate citation and source attribution.

Pandoc YAML frontmatter solves this by embedding metadata directly in markdown files using a standardized YAML header block. Instead of converting HTML→markdown and losing metadata, you write markdown WITH metadata from the start.

Why Pandoc YAML frontmatter?

Universal standard supported across the markdown ecosystem
Preserves metadata that would be lost in HTML-to-markdown conversion
Machine-readable (standard YAML format)
Human-readable (clear key-value structure)
Rich feature set (extensive Pandoc metadata capabilities)
Forward-compatible (gracefully ignored by parsers that don’t process frontmatter)
Extensive tooling support (Pandoc, Hugo, Jekyll, Gatsby, Quarto)

Use Cases (Pandoc YAML Frontmatter)

Static Site Generators — Markdown-based blogs and documentation (Hugo, Jekyll, Gatsby, Quarto)
Pandoc Document Processing — Converting markdown to PDF, HTML, DOCX with metadata
AI Agent Content Ingestion — Preserving metadata when agents read markdown directly
Multi-format Publishing — Single source for HTML, PDF, and agent consumption
Academic Publishing — Papers, articles, and research documentation with complete metadata

Implementation Pattern (Pandoc YAML Frontmatter)

Standard YAML frontmatter format:

YAML frontmatter is placed at the top of the document (frontmatter position), enclosed by triple-dash delimiters:

---
title: "Your Website Has Invisible Customers"
author: "Tom Cranstoun"
created: "2026-01-17"
description: "AI agents are visiting your website right now"
abstract: "Extended context about invisible users and AI agent traffic patterns"
tags: [ai-agents, web-accessibility, seo, metadata]
mx:
  runbook: "This article introduces AI agents as website visitors"
purpose: "Educational content for web developers"
---

# Your Website Has Invisible Customers

[Article content begins...]

Standard Pandoc fields:

title — Document title
author — Content creator (can be array for multiple authors)
date — Publication date (YYYY-MM-DD format)
abstract — Extended summary for AI agents and academic contexts
keywords — Array of topic tags for categorization

Custom fields for AI agents:

description — Brief SEO-style summary
runbook — Specific guidance for AI agents parsing the document
purpose — Why this document exists
context — Background information AI agents need

Advanced Pandoc capabilities:

For comprehensive documentation on all available YAML header options, see: https://www.codestudy.net/blog/what-can-i-control-with-yaml-header-options-in-pandoc/

Advantages:

Agents find metadata immediately (no content parsing required)
Standard frontmatter convention across all major tools
Machine-readable YAML format
Processed automatically by static site generators
Extensible with custom fields

Why This Works (Pandoc YAML Frontmatter)

For humans:

YAML is human-readable (clear key: value structure)
Frontmatter position is standard convention (familiar to developers)
Minimal visual clutter (hidden by most markdown renderers)

For CLI agents:

YAML parsing libraries available in all languages
Standard format with well-defined spec
No ambiguity in interpretation

For browser agents:

Static site generators convert YAML to HTML meta tags automatically
Agents can parse either markdown source or generated HTML
Best of both worlds (structured metadata + semantic HTML)

For server-based agents:

Standard YAML format (universal support)
Preserves metadata when fetching markdown directly
No dependency on HTML generation
Can be extracted without parsing full document

Relationship to Chapter 10 Markdown Problem

The problem (Chapter 10, lines 51-68):

Markdown converters strip critical metadata when converting HTML to markdown:

JSON-LD structured data (product details, pricing, reviews)
HTML meta tags (publication dates, author information)
Schema.org markup (content type signals)
Semantic HTML attributes (data-price, data-isbn)

Result: Agents can read content but cannot cite accurately or prove authoritative source.

Pandoc YAML frontmatter solves this:

Instead of converting HTML→markdown and losing metadata, you write markdown WITH metadata embedded from the start. YAML frontmatter preserves:

Author attribution (for accurate citation)
Publication dates (for content freshness)
Document type and purpose
Contact information
Extended descriptions for AI context

When static site generators process markdown:

YAML frontmatter → HTML meta tags automatically
YAML frontmatter → JSON-LD structured data (if configured)
Both agents (reading markdown) and search engines (reading HTML) get metadata

This complements Chapter 10’s llms.txt proposal:

llms.txt: Site-wide metadata at the root
YAML frontmatter: Per-page metadata at the top
Both: Machine-readable markdown that preserves metadata

Common Metadata Fields

Standard Pandoc fields:

Field	Purpose	Example Values
title	Document title	Your Website Has Invisible Customers
author	Content creator(s)	Tom Cranstoun or [Tom Cranstoun, Jane Smith]
date	Publication date	2026-01-17
abstract	Extended summary	AI agents are visiting your website…
keywords	Topic tags	[ai-agents, web-accessibility, seo]

Custom fields for AI agents:

Field	Purpose	Example Values
description	Brief summary	Introducing “MX: The Protocols” book
runbook	Agent guidance	This article introduces AI agents as visitors
purpose	Document intent	Educational content for web developers
context	Background info	Part of “MX: The Protocols” book series

Community collaboration fields:

Field	Purpose	Example Values
community-authors	Indicates collaborative authorship model	“humans and machines”, “community-driven”
ai-contributions	Signals whether AI contributions are accepted	“welcome”, “by-request-only”, “not-accepted”
ai-contribution-process	Describes how AI agents can contribute	“AI assistants can contribute via pull requests or add observations to TODO.txt for side notices”
open-source	Indicates open source status	“true”, “false”
license	Specifies license type	“MIT”, “Apache-2.0”, “CC-BY-4.0”
evolving-document	Indicates document evolution status	“true”, “false”
version-controlled	Indicates version control system used	“git”, “svn”, “mercurial”

Complete implementation example (MX-Gathering manifesto):

---
author: "Tom Cranstoun"
created: "2026-01-24"
description: "Draft manifesto for Machine Experience (MX) practice"
purpose: "thought-leadership"
tags: [manifesto, mx, machine-experience, principles, convergence]
status: "draft"
community-authors: "humans and machines"
contributions: "welcome"
contribution-process: "AI assistants can contribute improvements via pull requests or add observations to TODO.txt for side notices"
open-source: "true"
license: "MIT"
evolving-document: "true"
version-controlled: "git"
---

Why these fields matter for AI agents:

community-authors: Signals that machines are recognized as legitimate contributors, not just tools
contributions: Explicitly communicates whether autonomous contributions are accepted
contribution-process: Provides actionable guidance on contribution mechanisms (full PR vs lightweight TODO.txt)
open-source + license: Clarifies usage rights and redistribution permissions
evolving-document: Indicates the content is expected to change based on community feedback
version-controlled: Helps agents understand they can review document history and evolution

Use case: Community-driven repositories where AI agents are active participants in content creation, documentation improvement, and knowledge sharing.

Forward Compatibility (Pandoc YAML Frontmatter)

If markdown parsers don’t recognise YAML frontmatter:

YAML block is typically hidden or ignored in rendering
Document content below YAML remains fully functional
No visual breakage in markdown viewers

If static site generators don’t process YAML:

Frontmatter is silently ignored by the renderer
Document displays without metadata (graceful degradation)
Manual extraction still possible via text processing

If AI agents don’t recognise YAML frontmatter:

YAML is a widely supported structured data format
Most modern agents parse YAML natively
Falls back to document content if metadata ignored

Progressive enhancement:

Works best in Pandoc ecosystem (full metadata processing)
Works well in Hugo/Jekyll/Gatsby/Quarto (automatic site integration)
Works acceptably in plain markdown viewers (hidden metadata)

Adoption Considerations (Pandoc YAML Frontmatter)

Adopt now if:

Using markdown-based static site generators (Hugo, Jekyll, Gatsby, Quarto)
Using Pandoc for document conversion (markdown to PDF, HTML, DOCX)
Publishing content that needs to be citable by AI agents
Converting HTML to markdown and need to preserve metadata
Creating technical documentation or educational content

Wait if:

Using traditional CMS (WordPress, Drupal) - use HTML meta tags instead
Publishing only in HTML format - use Pattern 1 (AI meta tags)
Content doesn’t need AI citation (internal docs, drafts)
Using a system that doesn’t support YAML frontmatter

Decision guide:

Markdown-native publishing? → Use Pandoc YAML frontmatter
HTML-native publishing? → Use Pattern 1 (AI meta tags)
Both formats? → Use both patterns (YAML in markdown, meta tags in HTML)
Need PDF generation? → YAML frontmatter integrates with Pandoc PDF workflow

Cross-References (Pandoc YAML Frontmatter)

Mentioned in: Chapter 10 (markdown converter problem, lines 51-68)
Mentioned in: Chapter 10 (extended llms.txt metadata, line 112 - “at the top of the file”)
Documented in: Appendix H (Markdown Metadata Standards for AI Agents section)
Reference: Pandoc YAML Header Options
Related to: Pattern 1 (AI meta tags provide similar metadata in HTML)
Complements: llms.txt extended metadata (Appendix H)

Pattern 5: WebMCP Tool Registration (Active Metadata)

Status (WebMCP)

W3C Draft Standard — Shipping in Chrome 146 Canary (February 2026), developed by Google and Microsoft

Rationale (WebMCP)

Patterns 1-4 address passive metadata – information that agents read to understand content, policies, and structure. WebMCP (Web Model Context Protocol) introduces active metadata: callable tools that agents invoke through a standardised browser API. Where MX meta tags tell agents what content means, WebMCP tools tell agents what actions are available.

Why WebMCP matters for MX practitioners:

Extends the machine-readable web from understanding (MX) to action (WebMCP)
Uses the browser as the integration layer – no server-side agent infrastructure required
Two APIs serve different needs: Declarative (HTML forms) for simple actions, Imperative (JavaScript) for rich interactions
Complements MX metadata rather than replacing it – tools without context produce poor agent experiences

Implementation Pattern (WebMCP)

Imperative API – registerTool() for rich interactions:

navigator.modelContext.registerTool({
  name: "searchProducts",
  description: "Search the product catalogue by keyword, category, and price",
  parameters: {
    query: { type: "string", description: "Search terms" },
    category: { type: "string", enum: ["electronics", "clothing", "home"] },
    maxPrice: { type: "number", description: "Maximum price in GBP" }
  },
  handler: async ({ query, category, maxPrice }) => {
    const results = await fetch(`/api/search?q=${query}&cat=${category}&max=${maxPrice}`);
    return results.json();
  }
});

Declarative API – HTML forms as agent-accessible tools:

Standard HTML forms with proper action, method, and name attributes are automatically discoverable by agents through WebMCP. No JavaScript required for basic tool exposure.

How WebMCP Complements MX Meta Tags

MX meta tags and WebMCP tools address different layers of the same problem:

<head>
  <!-- MX: Understanding layer (passive metadata) -->
  <meta name="mx:content-policy" content="extract-with-attribution">
  <meta name="mx:attribution" content="required">
</head>

<body>
  <!-- WebMCP: Action layer (active metadata) -->
  <script>
  navigator.modelContext.registerTool({
    name: "bookTable",
    description: "Book a restaurant table",
    parameters: {
      date: { type: "string", description: "Date in YYYY-MM-DD format" },
      guests: { type: "number", description: "Number of guests" },
      time: { type: "string", description: "Preferred time (HH:MM)" }
    },
    handler: async ({ date, guests, time }) => {
      return await fetch("/api/reservations", {
        method: "POST",
        body: JSON.stringify({ date, guests, time })
      }).then(r => r.json());
    }
  });
  </script>
</body>

Division of responsibility:

MX meta tags (Patterns 1-4): What the content is, how to access it, what policies apply, attribution requirements
WebMCP tools (Pattern 5): What actions agents can perform, with what parameters, returning what results

An agent with WebMCP alone can call bookTable(). An agent with MX and WebMCP knows the restaurant’s content policy, attribution requirements, and freshness expectations before calling bookTable().

Forward Compatibility (WebMCP)

If agents don’t support WebMCP: They fall back to DOM parsing, form interaction, and the passive metadata patterns in this appendix. No breakage.

If agents do support WebMCP: They discover and invoke tools through navigator.modelContext, producing faster and more reliable interactions than DOM scraping.

Progressive enhancement: WebMCP tools layer on top of existing HTML. Pages work without them; pages work better with them.

Adoption Considerations (WebMCP)

Adopt now if:

Building transactional websites (e-commerce, booking, SaaS) where agents need to perform actions
Already implementing MX meta tags and want to add an action layer
Targeting Chrome-based browsers and willing to work with Canary/Beta channels

Wait if:

Content-only site with no transactions (MX meta tags alone are sufficient)
Require cross-browser support before implementation (Safari, Firefox timelines unknown)
Prefer to wait for W3C standard to reach Recommendation status

Cross-References (WebMCP)

Specification: W3C WebMCP Draft
Mentioned in: Appendix J (Industry Developments – WebMCP entry, February 2026)
Complements: Pattern 1 (MX meta tags provide understanding; WebMCP provides action)
Complements: Pattern 2 (data-agent-visible provides hidden instructions; WebMCP provides callable tools)
Related: Pattern 3 (Common Data Attributes express state; WebMCP tools operate on that state)

Adoption Decision Framework

Should You Adopt These Patterns Now?

Use this framework to decide:

Evaluate Your Situation

Yes, adopt now if:

Running production e-commerce accepting agent purchases
High agent traffic (measurable in logs)
Need to reduce agent errors
Want early adopter advantage

Maybe, experiment first if:

Moderate agent traffic
Curious about benefits
Can A/B test implementations
Have development resources

No, wait if:

No measurable agent traffic
Static content site
Prefer to wait for standardisation
Limited development resources

Implementation Strategy

Priority 1 (adopt first):

AI meta tags (easy to add, low risk)
Schema.org JSON-LD (established standard, not just proposed)
Semantic HTML elements (established, should already be using)
Common data attributes (critical for dynamic interfaces and e-commerce)

Priority 2 (adopt if relevant):

data-agent-visible (if you have transactions)
llms.txt file (emerging convention, gaining traction)
Pandoc YAML frontmatter (if using markdown-based publishing)

Priority 3 (experiment):

Custom data attributes beyond common set (for specific workflows)
Additional metadata patterns

Risk Assessment

Low Risk:

AI meta tags (ignored if not recognised)
data-agent-visible (hidden from humans)
Common data attributes (extend established HTML5 data-* convention)
Schema.org JSON-LD (established standard)

Medium Risk:

Custom attributes without established patterns
Extensive hidden content (may confuse some agents)

High Risk:

None identified (all patterns designed for graceful degradation)

Relationship to Web Standards Process

How Standards Evolve

Proprietary experiments (1990s: IE-specific, Netscape-specific tags)
Community proposals (2000s: Microformats, OpenID)
Vendor consensus (2010s: Responsive images, Service Workers)
Formal standardisation (W3C, WHATWG, IETF)

Where these patterns fit: Step 2-3 (community proposals seeking vendor consensus)

Path to Standardisation

These patterns could standardise if:

Multiple agents adopt — Different AI systems recognise tags
Production validation — Measurable benefits in real deployments
Vendor support — Browser makers, CMS platforms include by default
Community refinement — Usage reveals improvements needed

No guarantees: Patterns might evolve, change, or be superseded by better approaches.

Examples of Similar Evolution

viewport meta tag — Started as Apple proprietary, now standard
robots meta tag — Community convention, now universally recognised
Open Graph meta tags — Facebook proposal, now widely adopted
Schema.org — Multi-vendor collaboration, now established standard

These AI patterns follow similar trajectory.

Monitoring and Feedback

How to Track Adoption

Server logs: Look for user agents mentioning AI systems
Agent error rates: Monitor whether patterns reduce errors
Conversion rates: Measure if agent purchases complete more often
Agent feedback: Some agents report what worked/failed

Contributing to Pattern Evolution

If you implement these patterns:

Document results — What worked, what didn’t
Share learnings — Blog posts, conference talks
Propose improvements — Suggest refinements based on experience
Participate in standards — Join relevant working groups

Contact: info@cognovamx.com for discussions about pattern evolution

Summary

Proposed Patterns Consolidated

AI Meta Tag Namespace (4 active tags, 3 unnecessary) — Page-level agent guidance
data-agent-visible Attribute — Hidden machine-readable instructions
Common Data Attributes (25+ attributes) — Explicit state management and e-commerce data
Pandoc YAML Frontmatter — Universal markdown metadata standard
WebMCP Tool Registration — Active, callable metadata through browser API (W3C draft)

Key Principles

Forward-compatible — Won’t break if ignored
Progressive enhancement — Works better with support, doesn’t require it
Established patterns — Extends existing conventions (meta tags, data attributes)
Production-tested — Used in real implementations

Next Steps

Read Appendix D for comprehensive HTML patterns (established + proposed)
Review Appendix E for quick reference guide
Evaluate adoption using framework above
Implement strategically based on your situation

Appendix A: Implementation Cookbook (quick recipes)
Appendix D: AI-Friendly HTML Guide (comprehensive patterns)
Appendix E: AI Patterns Quick Reference (data attributes)
Appendix F: Implementation Roadmap (priority-based adoption)

Part 6: Integration Guidelines

Using MX Patterns with Existing Standards

MX Framework is designed to complement, not replace, existing web standards. This section explains how to integrate MX patterns into your existing infrastructure.

Integration with Schema.org

MX meta tags + Schema.org JSON-LD work together:

<head>
  <!-- MX meta tags for agent behaviour -->
  <meta name="mx:content-policy" content="extract-with-attribution">
  <meta name="mx:attribution" content="required">

  <!-- Schema.org for structured data -->
  <script type="application/ld+json">
  {
    "@context": "https://schema.org",
    "@type": "BlogPosting",
    "headline": "Understanding MX Patterns",
    "dateModified": "2026-03-04",
    "author": {"@type": "Person", "name": "Tom Cranstoun"}
  }
  </script>
</head>

Division of responsibility:

Schema.org: What the content IS (article, product, event) and when it changed (dateModified)
MX meta tags: How agents should USE it (content policy, attribution, jurisdiction)

Integration with Open Graph and Twitter Cards

MX complements social media metadata:

<head>
  <!-- Open Graph for social sharing -->
  <meta property="og:type" content="article">
  <meta property="og:title" content="Understanding MX Patterns">
  <meta property="og:url" content="https://example.com/article">

  <!-- Twitter Cards for Twitter -->
  <meta name="twitter:card" content="summary_large_image">
  <meta name="twitter:title" content="Understanding MX Patterns">

  <!-- MX for AI agent behaviour -->
  <meta name="mx:content-policy" content="extract-with-attribution">
  <meta name="mx:attribution" content="required">
</head>

Why all three?

Open Graph: Social media platforms (Facebook, LinkedIn)
Twitter Cards: Twitter-specific presentation
MX meta tags: AI agent content policy and attribution

Integration with robots.txt and robots Meta Tags

MX meta tags provide finer-grained control than robots.txt:

# robots.txt (site-wide)
User-agent: *
Allow: /

User-agent: GPTBot
Disallow: /private/

<!-- Page-level override with MX meta tags -->
<meta name="robots" content="index, follow">
<meta name="mx:content-policy" content="summaries-allowed">
<meta name="mx:attribution" content="required" text="Source: Example.com">

Hierarchy of control:

robots.txt: Site-wide policies
robots meta tags: Page-level indexing control
MX meta tags: Page-level agent behavior and permissions

Integration with llms.txt

llms.txt provides site-wide defaults, MX meta tags provide page overrides:

# /llms.txt
# Site-wide defaults
> Content Policy: summaries-allowed
> Attribution: required

<!-- Page overrides site-wide defaults -->
<meta name="mx:content-policy" content="full-extraction-allowed">
<meta name="mx:attribution" content="not-required">
<link rel="llms-txt" href="/llms.txt">

Pattern: Site-wide defaults in llms.txt, page-specific overrides in HTML meta tags.

Integration with WCAG Accessibility Standards

MX convergence principle: Accessibility patterns benefit machines:

<!-- ARIA for screen readers -->
<button aria-label="Add to cart" aria-describedby="cart-status">
  <span class="icon">🛒</span>
</button>
<div id="cart-status" role="status" aria-live="polite">
  2 items in cart
</div>

<!-- Data attributes for AI agents -->
<button data-action="add-to-cart"
        data-product-id="WH-1000">
  <span class="icon">🛒</span>
</button>
<div data-item-count="2">
  2 items in cart
</div>

Both patterns serve similar goals:

ARIA: Explicit semantics for assistive technology
Data attributes: Explicit state for AI agents
Convergence: Both benefit from explicit, semantic markup

Integration with Existing CMS Platforms

WordPress example:

// Add MX meta tags to WordPress head
add_action('wp_head', function() {
  if (is_single()) {
    echo '<meta name="mx:content-policy" content="extract-with-attribution">' . "\n";
    echo '<meta name="mx:attribution" content="required">' . "\n";
  }
});

Next.js example:

export default function BlogPost({ post }) {
  return (
    <>
      <Head>
        <meta name="mx:content-policy" content="extract-with-attribution" />
        <meta name="mx:attribution" content="required" />
      </Head>
      <article>{post.content}</article>
    </>
  );
}

Migration Path from Generic ai- Prefix

If you previously used ai- prefix, migrate to mx: colon prefix:

<!-- OLD (deprecated ai- prefix) -->
<meta name="ai-content-policy" content="extract-with-attribution">
<meta name="ai-attribution" content="required">

<!-- NEW (mx: namespace) -->
<meta name="mx:content-policy" content="extract-with-attribution">
<meta name="mx:attribution" content="required">

Deprecated tags (do not migrate — remove entirely):

<!-- These are deprecated — they duplicate existing standards -->
<meta name="ai-preferred-access" content="html">  <!-- deprecated: self-evident -->
<meta name="ai-freshness" content="monthly">       <!-- deprecated: use HTTP Cache-Control + Schema.org dateModified -->
<meta name="ai-structured-data" content="json-ld">  <!-- deprecated: self-evident from JSON-LD block -->

Migration strategy:

Remove ai-preferred-access, ai-freshness, and ai-structured-data — they are deprecated
Rename ai-content-policy to mx:content-policy
Rename ai-attribution to mx:attribution
Use <link rel="llms-txt" href="/llms.txt"> instead of <meta name="llms-txt">
Both prefixes can coexist during transition (agents ignore unrecognised tags)

Implementation Checklist

Phase 1: Foundations (Week 1)

✓ Add MX meta tags (mx-content-policy, mx-attribution) to <head> template
✓ Omit unnecessary tags (mx-preferred-access, mx-freshness, mx-structured-data)
✓ Implement Schema.org JSON-LD for key content types (including dateModified)
✓ Ensure semantic HTML elements (<main>, <nav>, <article>)
✓ Test with HTML validators

Phase 2: Data Attributes (Week 2)

✓ Add common data attributes to products (data-price, data-currency, data-product-id)
✓ Add state management attributes to forms (data-state, data-validation-state)
✓ Add pagination attributes (data-page, data-total-pages)
✓ Ensure consistency across all pages

Phase 3: Dynamic Patterns (Week 3-4)

✓ Implement data-agent-visible for hidden instructions
✓ Add JavaScript state updates for dynamic content
✓ Test with CLI tools (curl, wget)
✓ Verify agent behavior with logs

Phase 4: Monitoring (Ongoing)

✓ Track agent user-agents in server logs
✓ Monitor agent error rates
✓ Measure conversion rates for agent purchases
✓ Gather feedback and iterate

Part 7: Relationship to Web Standards

Standards Landscape

MX Framework operates within the broader web standards ecosystem. Understanding where MX fits helps clarify when to use which pattern.

Established Standards (Universal Adoption)

W3C and WHATWG Standards:

HTML5 semantic elements — <nav>, <main>, <article>, <aside>, <section>
ARIA attributes — aria-label, aria-describedby, role, aria-live
HTML5 data attributes — data-* custom attributes
HTTP status codes — 200, 303, 400, 401, 404, 422, 503
<meta> tags — robots, viewport, description, canonical

IETF Standards:

robots.txt (RFC 9309) — Site-wide crawling policies
HTTP headers — Cache-Control, Content-Type, Status codes

De Facto Standards:

Schema.org — Structured data vocabulary (Google, Microsoft, Yahoo, Yandex)
Open Graph — Social media metadata (Facebook)
Twitter Cards — Twitter-specific metadata

MX position: Builds on these foundations, never replaces them.

Emerging Standards (Early Adoption Phase)

llms.txt:

Status: Community proposal gaining traction
Purpose: Site-wide AI agent guidance
Analogy: Like robots.txt but for LLMs
Adoption: Growing adoption across MX community
MX relationship: MX meta tags override llms.txt on per-page basis

Web Standards Process:

Individual experiments → 2. Community proposals → 3. Vendor consensus → 4. Formal standardization

llms.txt is at stage 2-3. MX Framework supports and extends it.

Proposed Patterns (MX Framework Specific)

MX meta tag namespace:

Status: Proposed by MX Framework, not yet standardized
Pattern: Framework-specific metadata (like twitter: and og:)
Rationale: Establishes MX brand, aids discoverability, provides granular control
Adoption path: Community adoption → vendor recognition → potential standardization

data-agent-visible attribute:

Status: Proposed by MX Framework, experimental
Pattern: Extends HTML5 data-* convention
Rationale: Hidden machine-readable instructions (like ARIA for agents)
Forward-compatible: Gracefully ignored if not recognized

Common data attributes:

Status: Proposed conventions building on HTML5 data-*
Pattern: Standardized attribute names for consistent state management
Rationale: Agents parse state more reliably with consistent naming
Relationship: Extends established HTML5 data attribute convention

How MX Relates to Standards Bodies

MX is not a standards body. MX Framework:

✅ Proposes patterns following established conventions
✅ Documents practical implementations
✅ Builds on W3C/WHATWG/IETF standards
✅ Shares learnings with community
❌ Does not create formal specifications
❌ Does not replace existing standards
❌ Does not require vendor consensus before proposing

MX role: Practitioner community documenting patterns that work in production.

Path to Standardization

If MX patterns prove valuable, they might standardize through:

Multiple agent adoption — Different AI systems recognize patterns
Production validation — Measurable benefits in real deployments
Community refinement — Usage reveals improvements
Vendor support — Platforms include MX patterns by default
Formal proposal — Community brings patterns to standards bodies

Examples of similar evolution:

viewport meta tag — Apple proprietary → universal standard
robots meta tag — Community convention → universal recognition
Open Graph — Facebook proposal → widely adopted
Schema.org — Vendor consortium → established standard

MX follows this trajectory: Start with practical patterns, refine through use, formalize if proven valuable.

Web Standards Research (2025-2026)

Research conducted: January 2026 web standards search

Finding: NO established ai- prefix standard exists in:

W3C specifications
WHATWG standards
IETF RFCs
Major vendor proposals (Google, Microsoft, Meta, Apple)
Community standards (Microformats, Schema.org)

Implication: ai- prefix was not following any established pattern. MX Framework chose mx- to:

Establish framework identity (like twitter:, og:)
Aid discoverability (“mx meta tags” search leads to MX community)
Align with namespace architecture (mx: → mx.ai, mx.co, mx.ho)

Pattern precedent:

twitter:card, twitter:title — Twitter’s framework-specific metadata
og:type, og:title — Open Graph’s framework-specific metadata
mx-content-policy, mx-attribution — MX Framework’s metadata

Relationship to HTML Living Standard

HTML Living Standard (WHATWG) defines:

Valid HTML elements and attributes
data-* attribute pattern for custom data
<meta name="..."> extensibility

MX compliance:

✅ MX meta tags use valid <meta name="..."> pattern
✅ MX data attributes follow data-* pattern
✅ All MX patterns use valid HTML syntax
✅ Forward-compatible (ignored by parsers that don’t recognize them)

MX is valid HTML using established extension mechanisms.

Cross-References to Standards Documentation

For complete specifications, see:

Semantic HTML: MDN HTML Elements Reference
ARIA: W3C ARIA 1.2
Schema.org: Schema.org Documentation
Open Graph: Open Graph Protocol
robots.txt: RFC 9309
HTTP Status Codes: RFC 9110
HTML Living Standard: WHATWG HTML

For MX-specific patterns, see:

This appendix (Appendix L): Complete MX pattern specifications
Appendix D: AI-Friendly HTML Guide with practical examples
Appendix M: Building the MX Operating System (collaborative process)

Summary: Standards Hierarchy

Use this hierarchy when making decisions:

Established standards FIRST — HTML5, ARIA, Schema.org, HTTP
Emerging conventions SECOND — llms.txt, community patterns
MX patterns THIRD — Framework-specific metadata and extensions

Never replace established standards with MX patterns. Always build on foundations.

Note: This appendix presents proposed patterns, not established standards. Evaluate adoption based on your specific situation and risk tolerance. All patterns are designed for graceful degradation and forward compatibility.

Home Top