Appendix B: Proven Lessons
MX-Protocols
January 2026
Appendix B: Proven Lessons
Real-world learnings from implementing AI agent compatibility patterns in production. Mistakes to avoid and solutions that work.
1. Progressive Enhancement Requires Discipline
The Problem: Developers added JavaScript enhancements that accidentally broke the served HTML baseline.
What Happened:
// Server renders initial price
<div id="price">£149.99</div>
// JavaScript "enhancement" breaks it
<script>
document.getElementById('price').innerHTML = 'Loading...';
fetch('/api/price').then(r => r.json()).then(data => {
document.getElementById('price').innerHTML = `£${data.price}`;
});
</script>Agents saw “Loading…” because JavaScript replaced the server-rendered content before fetching completed.
Solution: Never replace server-rendered content - only enhance it:
// Server renders baseline
<div id="price" data-static="true">£149.99</div>
// JavaScript enhances without replacing
<script>
const priceEl = document.getElementById('price');
if (!priceEl.dataset.static) return; // Safety check
fetch('/api/price').then(r => r.json()).then(data => {
// Update only if price changed
if (data.price !== priceEl.textContent.replace(/[£,]/g, '')) {
priceEl.textContent = `£${data.price}`;
priceEl.classList.add('price-updated');
}
});
</script>Lesson: JavaScript should enhance, never replace. If server HTML works, JavaScript must preserve that baseline.
2. Toast Notifications are Hard to Kill
The Problem: Design team loved toast notifications. Developers kept reintroducing them despite knowing they break agents.
What Happened: After removing toasts, a new feature added them back:
// Removed from form validation (good!)
// But reintroduced in shopping cart (bad!)
addToCart(item).then(() => {
showToast('Item added!'); // Breaks agents again
});Solution: Establish a component library policy:
// toast.js - Make breaking changes hard to reintroduce
export function showToast(message) {
console.warn('Toast notifications break AI agents. Use persistentAlert() instead.');
throw new Error('Toast notifications disabled. Use persistentAlert().');
}
export function persistentAlert(message, containerId = 'alerts') {
const container = document.getElementById(containerId);
const alert = document.createElement('div');
alert.className = 'persistent-alert';
alert.setAttribute('role', 'status');
alert.textContent = message;
container.appendChild(alert);
return alert;
}Lesson: Make anti-patterns hard to reintroduce. Deprecate problematic functions and provide better alternatives.
3. Pagination Removal Needs Backend Changes
The Problem: Removed pagination from frontend, but backend still limited results to 10 items per page.
Frontend change:
// Removed pagination UI
// Added "Show all" button
<button onclick="showAll()">Show all results</button>Backend still had:
app.get('/products', (req, res) => {
const limit = 10; // Hardcoded limit still here!
const products = db.query('SELECT * FROM products LIMIT ?', [limit]);
res.json(products);
});Agents requesting /products still got only 10
results.
Solution: Backend must support full retrieval:
app.get('/products', (req, res) => {
// Support both paginated and full retrieval
const limit = req.query.limit ? parseInt(req.query.limit) : null;
const offset = req.query.offset ? parseInt(req.query.offset) : 0;
let query = 'SELECT * FROM products';
let params = [];
if (limit) {
query += ' LIMIT ? OFFSET ?';
params = [limit, offset];
}
const products = db.query(query, params);
res.json({
products,
total: db.query('SELECT COUNT(*) FROM products')[0].count,
returned: products.length
});
});Lesson: Frontend and backend changes must be coordinated. Removing pagination UI doesn’t help if API still limits results.
4. Structured Data Gets Out of Sync
The Problem: HTML pricing and JSON-LD pricing showed different values after a price update.
What Happened:
<!-- HTML updated -->
<div class="price">£159.99</div>
<!-- JSON-LD forgotten -->
<script type="application/ld+json">
{
"@type": "Offer",
"price": "149.99" <!-- Old price! -->
}
</script>Agents reading JSON-LD got wrong price.
Solution: Generate JSON-LD from the same data source as HTML:
// Server-side template
function renderProduct(product) {
return `
<div class="price">${product.formattedPrice}</div>
<script type="application/ld+json">
${JSON.stringify({
"@context": "https://schema.org",
"@type": "Offer",
"price": product.price,
"priceCurrency": product.currency
})}
</script>
`;
}Lesson: Structured data and HTML must share the same data source. Don’t maintain prices in two places.
5. Form Validation State Attributes Forgotten
The Problem: Added
data-validation-state to some form fields but not others.
Inconsistent implementation confused both developers and agents.
What Happened:
<input id="email" data-validation-state="valid"> <!-- Has state -->
<input id="phone"> <!-- Missing state -->
<input id="postcode" data-validation-state="invalid"> <!-- Has state -->Agents couldn’t determine phone field status.
Solution: Validation framework ensures consistent state:
class FormValidator {
constructor(formId) {
this.form = document.getElementById(formId);
// Ensure ALL inputs have validation state
this.form.querySelectorAll('input, select, textarea').forEach(field => {
if (!field.hasAttribute('data-validation-state')) {
field.setAttribute('data-validation-state', 'empty');
}
});
}
updateField(field, state) {
field.setAttribute('data-validation-state', state);
field.setAttribute('aria-invalid', state === 'invalid');
}
}Lesson: Establish patterns that ensure consistency. If one field has state attributes, all fields should have them.
6. Hidden State in Checkout Flow
The Problem: Checkout steps tracked in JavaScript, not reflected in URL or DOM attributes.
What Happened:
// Step tracked only in JavaScript
let currentStep = 1;
function nextStep() {
currentStep++;
updateUI();
}Agents couldn’t tell which checkout step they were on. Refreshing the page lost progress.
Solution: Explicit state in URL and DOM:
// URL reflects state
function goToStep(step) {
window.history.pushState({step}, '', `/checkout?step=${step}`);
document.body.setAttribute('data-checkout-step', step);
updateUI(step);
}
// Restore state on page load
window.addEventListener('load', () => {
const params = new URLSearchParams(window.location.search);
const step = params.get('step') || '1';
goToStep(step);
});Lesson: State must be visible in URL and DOM. Hidden JavaScript state breaks agents and hurts users (no bookmark, no refresh).
7. Error Messages Need Unique IDs
The Problem: Multiple validation errors all had
id="error", breaking ARIA associations.
What Happened:
<input id="email" aria-describedby="error">
<div id="error">Invalid email</div>
<input id="phone" aria-describedby="error">
<div id="error">Invalid phone</div> <!-- Duplicate ID! -->Screen readers and agents couldn’t associate errors with fields.
Solution: Unique error IDs per field:
<input id="email" aria-describedby="email-error">
<div id="email-error">Invalid email format</div>
<input id="phone" aria-describedby="phone-error">
<div id="phone-error">Invalid phone format</div>Lesson: Every error message needs a unique ID. Use
pattern: {fieldId}-error.
8. Loading States Without Expected Duration
The Problem: Added data-state="loading"
but agents still didn’t know how long to wait.
First attempt:
<div data-state="loading">Loading...</div>Better, but agents timeout randomly - some after 5 seconds, some after 30 seconds.
Solution: Provide expected duration:
<div data-state="loading"
data-load-started="2025-01-04T10:30:00Z"
data-expected-duration="2000">
Loading product information (estimated 2 seconds)
</div>Agents can now make informed timeout decisions.
Lesson: Loading states should indicate expected duration. “Loading…” is insufficient - specify how long.
9. Inline Styles Bloat HTML for Agents
The Problem: Used inline styles for convenience during development but never refactored to external CSS. Agents that parse HTML but don’t execute CSS were downloading and processing hundreds of lines of unused styling code with every page request.
First attempt:
<div style="padding: 2rem; background: #f3f4f6; border-radius: 8px;">
<h2 style="font-size: 1.5rem; color: #1e40af; margin-bottom: 1rem;">
Product Details
</h2>
<button style="padding: 1rem 2rem; background: #3b82f6; color: white; border: none;"
onclick="addToCart()">
Add to Cart
</button>
</div>This HTML file was 45KB, with 22KB being inline styles that CLI agents and server-based agents never use. They still parse it, slowing down extraction.
Solution: Move all styling to external CSS file:
<head>
<link rel="stylesheet" href="styles.css">
<script src="cart.js" defer></script>
</head>
<body>
<div class="product-card">
<h2>Product Details</h2>
<button class="btn-primary" data-action="add-to-cart">Add to Cart</button>
</div>
</body>HTML file reduced to 23KB. Browsers cache the CSS file. Agents parse clean HTML without style noise.
Lesson: Separate presentation from content. Inline styles waste bandwidth for agents that don’t render them and make HTML harder to parse. External CSS benefits everyone - agents get faster parsing, browsers get caching, developers get maintainability.
10. Pre-Converting Pages to Markdown Stripped Metadata
The Problem: Built a “chatbot-friendly” site by converting all pages to markdown. In 2023, this seemed smart - simpler format, easier parsing, clean text. By 2025, agents struggled to cite us accurately. Prices were often wrong, publication dates got hallucinated, author attribution disappeared.
Investigation revealed the problem:
Our conversion pipeline stripped everything that agents needed for accurate discovery and citation:
- JSON-LD structured data (product details, pricing, reviews)
- HTML meta tags (publication dates, author information, canonical URLs)
- Schema.org markup (explicit content type signals)
- Semantic attributes (data-price, data-currency, data-formats)
We had optimised for the 2023 paradigm (chatbots that just answered questions) whilst the market moved to 2026 reality (agents that discover, evaluate, compare, and cite sources with accuracy).
What we saw:
Agents reading our markdown would cite “approximately £30” when price was exactly £24.99. They’d say “published recently” when we had explicit dates. They’d mention the product but not link to our site because they had no canonical URL.
Competitors with rich HTML and structured data got cited accurately. We got vague references or got skipped entirely.
Solution: Serve rich HTML with full metadata layers:
<article itemscope itemtype="https://schema.org/Book">
<h1 itemprop="name">MX: The Protocols</h1>
<div itemprop="offers" itemscope itemtype="https://schema.org/Offer">
<meta itemprop="price" content="24.99">
<meta itemprop="priceCurrency" content="GBP">
<meta itemprop="availability" content="https://schema.org/InStock">
<span>£24.99</span>
</div>
<time itemprop="datePublished" datetime="2026-03-31">
Published: 31 March 2026
</time>
</article>Citation accuracy improved dramatically. Agents now reference exact prices, correct dates, proper author attribution. If a platform needs markdown, they can extract it from our rich HTML - we don’t pre-strip the signals they need.
Lesson: Don’t optimise for yesterday’s paradigm. Agents in 2026 need structured metadata for accurate citation, not stripped markdown that loses context. Provide rich source material and let platforms convert to simpler formats if needed. You can’t add structure back to stripped markdown, but you can always extract markdown from rich HTML.
11. robots.txt Missing Sitemap Declaration
The Problem: Created robots.txt but forgot sitemap declaration. Agents couldn’t discover content efficiently.
First version:
User-agent: *
Disallow: /admin/
Works, but agents have to crawl entire site to discover structure.
Better:
User-agent: *
Disallow: /admin/
Sitemap: https://example.com/sitemap.xml
Agents now discover 10,000 pages instantly instead of crawling slowly.
Lesson: robots.txt should always declare sitemap. Dramatically improves discoverability.
12. Schema.org Types Wrong for Content
The Problem: Used Article type for
product pages. Agents expected article content, got confused by
pricing.
Wrong:
{
"@type": "Article",
"name": "Wireless Headphones",
"price": "149.99" // Articles don't have prices!
}Right:
{
"@type": "Product",
"name": "Wireless Headphones",
"offers": {
"@type": "Offer",
"price": "149.99",
"priceCurrency": "GBP"
}
}Lesson: Use correct Schema.org types: Product for products, Article for articles, LocalBusiness for businesses. Type mismatch confuses agents.
13. Testing Only in Chrome
The Problem: Tested agent compatibility only in Chrome DevTools with JavaScript disabled.
What we missed: Server-rendered HTML had Chrome-specific CSS that broke in other contexts:
.price {
display: -webkit-box; /* Chrome-specific */
}Agents parsing without browser context saw broken layout references.
Solution: Test with actual curl/wget:
# Test served HTML as agents see it
curl https://example.com/product/123 | grep -i "price"
# Verify structured data
curl -s https://example.com/product/123 | \
grep -o '<script type="application/ld+json">.*</script>' | \
jq .Lesson: Test served HTML with actual HTTP clients (curl/wget), not just browser DevTools.
14. Incomplete Pricing Disclosure Persisted
The Problem: Changed “From £99” to “£149.99” on product pages, but forgot checkout summary, confirmation emails, and API responses.
Inconsistency:
- Product page: “£149.99” (fixed)
- Cart: “From £99” (not fixed)
- Email: “Starting at £99” (not fixed)
- API:
{"base_price": 99}(not fixed)
Solution: Search entire codebase for pricing patterns:
# Find all price references
grep -r "From £" .
grep -r "Starting at" .
grep -r "base_price" .
# Update systematicallyLesson: Pattern changes must be applied everywhere. Use grep to find all instances, update systematically.
15. API and Web UI Out of Sync
The Problem: Fixed web UI for agents, but API still returned paginated results and incomplete data.
Web UI:
- Shows all products on one page [YES]
- Complete pricing visible [YES]
- Structured data present [YES]
API:
- Returns 10 products per page [NO]
- Missing tax in price response [NO]
- No structured format [NO]
Agents using API got inferior experience to agents scraping web UI.
Solution: API should be first-class interface:
app.get('/api/products', (req, res) => {
res.json({
products: db.getAllProducts().map(p => ({
id: p.id,
name: p.name,
price: {
amount: p.price,
currency: 'GBP',
includes_vat: true,
formatted: `£${p.price}`
},
availability: {
in_stock: p.stock > 0,
quantity: p.stock
}
})),
meta: {
total: db.countProducts(),
schema: 'https://schema.org/Product'
}
});
});Lesson: API must match or exceed web UI quality. Don’t fix UI and leave API broken.
16. Forgetting Mobile Viewport
The Problem: Fixed desktop patterns but mobile site still paginated, hid content, showed incomplete prices.
Root cause: Separate mobile codebase or responsive breakpoints that reverted to problematic patterns.
Solution: Test agent patterns at mobile breakpoints:
// Test with mobile viewport
const { test } = require('@playwright/test');
test('mobile pricing complete', async ({ page }) => {
await page.setViewportSize({ width: 375, height: 667 });
await page.goto('/product/123');
const price = await page.textContent('.price');
expect(price).toContain('£149.99');
expect(price).not.toContain('From');
});Lesson: Agent-friendly patterns must work on mobile too. Test all breakpoints.
17. Authentication Required for Public Data
The Problem: Added authentication to API endpoints, breaking agent access to public product data.
Before (working):
app.get('/api/products/:id', (req, res) => {
res.json(getProduct(req.params.id));
});After (broke agents):
app.get('/api/products/:id', requireAuth, (req, res) => {
res.json(getProduct(req.params.id));
});Public product data now requires authentication. Agents can’t browse products.
Solution: Separate public and private data:
// Public product data - no auth required
app.get('/api/public/products/:id', (req, res) => {
res.json(getProduct(req.params.id));
});
// Private customer-specific data - auth required
app.get('/api/customer/products/:id', requireAuth, (req, res) => {
const product = getProduct(req.params.id);
const customerData = getCustomerProductData(req.user.id, product.id);
res.json({ ...product, customerData });
});Lesson: Public data should remain public. Don’t add authentication to publicly-browsable content.
18. Performance Optimization Lessons
Browser Pooling
Context: Early versions launched a new Puppeteer browser for every URL, adding 2-5 seconds per page.
Problem: 100-URL sites took 45+ minutes to analyze, making the tool impractical for large sites.
Solution: Maintain a pool of 3 reusable browsers, restart after 50 pages to prevent memory leaks.
Impact: 97% reduction in browser launches, 3-5x overall speedup.
Key insight: Resource pooling eliminates repetitive initialization overhead. The tradeoff is memory usage, but automatic restarts prevent leaks.
Adaptive Rate Limiting
Context: Fixed-rate limiting either overwhelmed servers (too fast) or wasted time (too slow).
Problem: No single rate works for all servers. Some handle 10 concurrent requests, others struggle with 2.
Solution: Monitor 429/503 responses, dynamically adjust concurrency with exponential backoff and gradual recovery.
Impact: Server-friendly analysis without manual rate tuning.
Key insight: Reactive systems adapt to actual conditions better than fixed configuration. Let the server tell you when to slow down.
Cache Staleness Detection
Context: Cached data could become outdated if source pages changed between analysis runs.
Problem: Stale cache served incorrect data, undermining report accuracy.
Solution: HTTP HEAD requests check Last-Modified headers, automatic invalidation when source is newer.
Impact: Data freshness guaranteed without full re-analysis.
Key insight: Validation metadata (Last-Modified, ETags) enables lightweight freshness checks. Conservative error handling (assume fresh on failure) prevents false positives.
19. Ethical Scraping Lessons
robots.txt Compliance
Context: The Web Audit Suite needed to respect website policies whilst providing useful analysis.
Problem: Some sites block automated tools via robots.txt, creating tension between functionality and ethics.
Solution: Phase 0 fetches robots.txt before crawling, with interactive prompts for blocked URLs and runtime force-scrape toggle.
Impact: Ethical scraping by default, with user control for legitimate use cases.
Key insight: Tools must respect website policies whilst enabling legitimate analysis. Interactive prompts give users agency without sacrificing ethics.
Book reference: Chapter 5 discusses content creator concerns about automated access.
robots.txt Quality Analysis
Context: Many robots.txt files provide minimal guidance for AI agents.
Problem: Website owners want to control agent access but don’t know what to include in robots.txt.
Solution: 100-point scoring system evaluates 6 criteria (AI-specific user agents, sitemap references, sensitive path protection, llms.txt references, comments, completeness).
Impact: Actionable feedback helps sites improve agent guidance.
Key insight: Educational tools that explain “why” drive adoption better than binary pass/fail judgments.
Book reference: Chapter 10 covers robots.txt best practices for AI agents.
Key Takeaways
- Progressive enhancement requires discipline: JavaScript must enhance, never replace server-rendered content
- Consistency is critical: If one field has state attributes, all fields must have them
- Frontend and backend must align: Removing pagination UI doesn’t help if API limits results
- Structured data must stay in sync: Generate JSON-LD from same source as HTML
- State must be visible: Hidden JavaScript state breaks agents and users
- Test comprehensively: Test served HTML with curl/wget, not just browser DevTools
- Pattern changes are global: Use grep to find all instances, update systematically
- API must be first-class: Don’t fix UI and leave API broken
- Mobile matters: Agent patterns must work at all breakpoints
- Public data stays public: Don’t add authentication to browsable content
Home Top