Appendix I: Pipeline Failure Case Study

MX-Protocols

Tom Cranstoun

January 2026

Appendix I - Pipeline Failure Case Study

The £203,000 Cruise Pricing Error

A detailed analysis of a real-world AI pipeline failure and the validation layers that should have prevented it.


Error Summary

Source of Error: This analysis is based on output from the author’s AI assistant (Claude for Chrome, early beta version) when researching cruise options in December 2024. The agent was asked to find Danube cruises ending in Budapest in May 2026.

The Error: The AI-generated cruise itinerary incorrectly listed one luxury river cruise operator’s pricing as £203,000-£402,000 when the actual pricing was likely £2,030-£4,020 per person.

This represents a 100x multiplication error - two extra zeros added to the correct price.

Note on Anonymization: Operator names have been anonymized in this appendix. The error occurred in the agent’s reasoning and data extraction, not due to any fault by the cruise operators themselves. All operators mentioned in the original output (Saga Cruises, Uniworld River Cruises, Viking River Cruises) provide legitimate services with accurate pricing on their websites.

This appendix provides the complete error analysis that informed Chapter 13’s discussion of pipeline failures, validation layers, and confidence scoring. It demonstrates why agent creators must build guardrails into their systems.


Error Chain Analysis

Stage 1: User Query

Input: Request for cruise itinerary information (Germany → Budapest → Rovinj, May 2026)

AI Task: Research multiple cruise operators, gather pricing, dates, and details

Agent’s Actual Output (Anonymized)

The agent provided comparative information for three operators:

Operator A - “Scenic River Journey”

  • Route: Vienna to Budapest (7 nights)
  • Departs: ~10-12 May 2026
  • Pricing: Not specified (all-inclusive package)
  • Features: All-inclusive (drinks, dining, excursions)
  • Rating: 4.5/5 stars
  • Suited for: All-inclusive comfort

Operator B - “Delightful Cruise Experience”

  • Route: Various starting points to Budapest
  • Departs: Early May 2026
  • Pricing: £203,000-£402,000 ← ERROR
  • Ship: Luxury vessel
  • Suited for: Luxury, boutique experience

Operator C - “Classic River Route”

  • Route: Multiple options ending in Budapest
  • Departs: Early-mid May 2026
  • Pricing: Not specified
  • Features: Elegant design, frequent departures
  • Suited for: First-timers

Critical observation: The agent provided pricing for only one of three operators. Operators A and C showed “not specified” or were presented without price data. Only Operator B included pricing - and that pricing was erroneous.

What comparative analysis should have revealed: Even without pricing for all operators, Operator B’s £203,000-£402,000 is 100x higher than typical river cruise pricing (£2,000-£6,000). This extreme outlier should have triggered immediate validation flags, especially given the absence of pricing data for peer operators with which to compare.

Stage 2: Information Retrieval

AI Action: Web search or database query for “luxury Danube cruise pricing 2026”

Data Source Encountered: Likely one of:

  • Travel booking website with pricing tables
  • PDF brochure with formatted prices
  • Comparison site with multiple currency formats
  • Cached or indexed data from previous years

Stage 3: Data Parsing Error (Critical Failure Point)

This is where the error occurred. Four possible scenarios:

Scenario A: Decimal Separator Confusion

Original Source: €2.030,00 - €4.020,00 (European format)
AI Interpretation: 2.030 = 2030 (treating period as decimal)
Plus conversion: 2030 × 100 = 203,000 (incorrect zero addition)

European number formatting uses periods as thousands separators and commas for decimals. If the agent parsed European format as British format, it would misread the number by 100x.

Scenario B: Number Concatenation

Original Source:
  Early Booking: £2,030
  Standard Rate: £4,020

AI Misread:
  Concatenated without separators: 20304020
  Split incorrectly: 203,040 and 402,000
  Formatted: £203,000-£402,000

HTML might present pricing across multiple elements without clear separators. Incorrect concatenation produces magnitude errors.

Scenario C: Wrong Data Field

Original Table:
| Cruise      | Per Person | Total Revenue | Fleet Value |
|-------------|-----------|---------------|-------------|
| Operator X  | £2,030    | £203,000      | £4.02M     |

AI Selected: Total Revenue column instead of Per Person column

The agent extracted the wrong field entirely. The table showed multiple price points - per person, total voyage revenue, fleet valuation. Selecting the wrong column produced the error.

Scenario D: HTML/CSS Parsing Error

<span class="price">
  <span class="currency">£</span>
  <span class="thousands">2</span>
  <span class="hundreds">030</span>
</span>

AI Read: Concatenated spans without proper separators
Result: £2030 → reformatted with UK thousands separator → £203,000

CSS formatting might visually separate elements that are adjacent in HTML. Parsing without CSS awareness produces concatenation errors.

Stage 4: Validation Failure (Missing Guardrails)

Missing Sanity Check:

The agent should have flagged this as anomalous because:

  • £203,000 per person = £406,000 per couple for one week
  • This exceeds typical round-the-world luxury cruise pricing
  • Luxury river cruise operators typically price £3,000-£8,000
  • No contextual warnings about “ultra-luxury” or “suite-only” pricing

Why Validation Failed:

  • No price range boundaries set in agent parameters
  • Luxury cruise market has high variance (£500-£50,000+ exists)
  • Agent lacked real-time market context for 2026 pricing
  • No comparative check against other operators’ pricing

This is the critical failure: Not the parsing error (which is understandable), but the absence of validation layers that should have caught it before output.

Stage 5: Output Generation

AI Action: Formatted information into structured document

Error Propagation:

  • Incorrect price passed through without correction
  • Presented alongside accurate information (dates, routes, ratings)
  • Mixed with legitimate high-end pricing from other operators
  • No caveats or verification notes added

Format creates false confidence: Professional formatting and detailed presentation mask underlying data quality issues. Structure != accuracy.

Stage 6: Human Detection

User Query: “Does the pasted document really quote 203,000?”

Recognition Point: Human domain knowledge identified impossibility

  • Immediate recognition that price was absurd
  • Context awareness (river cruises don’t cost £200k)
  • Comparative reasoning (other operators priced reasonably)

This reveals the gap: The human had validation layers (domain knowledge, comparative context, scepticism) that the agent lacked.


Error Classification

Type: Data Transformation Error

Specifically: Numerical parsing and formatting mistake during extraction phase

Severity: High

  • Factually incorrect by factor of 100
  • Could mislead users about affordability
  • Undermines trust in other accurate data
  • Creates false confidence through professional formatting

Detectability: High (for humans)

  • Obvious to domain experts
  • Detectable through comparison with other prices
  • Fails basic reasonableness test

Detectability: Low (for agents without validation)

  • No automated flags or warnings
  • Presented with same confidence as verified data
  • No comparative analysis performed
  • No cross-referencing against structured data

Why This Error Matters

1. Compounding Trust Issues

When AI presents detailed, formatted information with specific numbers, users assume verification has occurred. Mixing accurate data (dates, routes, ratings) with incorrect data (pricing) creates false confidence.

The danger: Users trust the agent because 90% of the information is correct. The 10% that’s wrong (pricing) is precisely the critical decision point.

2. Silent Failure

No warning flags, caveats, or uncertainty indicators accompanied the incorrect price. The agent presented it with the same confidence as verified information.

Missing elements:

  • No “unverified - recommend checking operator site” caveat
  • No confidence score (e.g., “40% confidence due to data conflicts”)
  • No comparative context (e.g., “significantly above market average”)
  • No source attribution (e.g., “extracted from TourRadar.com, January 2026”)

3. Human Expertise Required

Detection required:

  • Domain knowledge (cruise pricing norms)
  • Contextual reasoning (comparative analysis)
  • Scepticism (questioning presented facts)

The problem: Most users lack domain expertise for every category where they use agents. You might know cruise pricing but not insurance rates, legal fees, or medical costs. The agent should provide validation, not require it.

4. Systematic Weakness

This type of error reveals:

  • Insufficient validation rules
  • No cross-referencing mechanisms
  • Limited real-world knowledge boundaries
  • Absence of “common sense” filters

The implication: If an obvious 100x error passes through, subtler errors (20% too high, wrong dates, incorrect inclusions) likely pass through unchallenged.


Prevention Strategies

For AI Systems (Agent Creators)

1. Range Validation

// Example validation logic
class PriceValidator {
  validate(price, category) {
    const ranges = {
      'river-cruise': { max: 15000, typical: 5000 },
      'ocean-cruise': { max: 50000, typical: 8000 }
    };

    const range = ranges[category];
    if (price > range.max) {
      return {
        valid: false,
        confidence: 20,
        warning: `Price (£${price}) exceeds typical maximum (£${range.max})`
      };
    }

    return { valid: true, confidence: 95 };
  }
}

This would have caught the error: £203,000 > £15,000 maximum → flagged for review

2. Comparative Checks

// Compare against other operators
function validateAgainstPeers(price, competitorPrices) {
  const avgPrice = calculateAverage(competitorPrices);
  const ratio = price / avgPrice;

  if (ratio > 10) {
    return {
      valid: false,
      confidence: 10,
      warning: `Price ${ratio.toFixed(1)}x higher than market average (£${avgPrice})`
    };
  }

  return { valid: true, confidence: 90 };
}

// Usage
validateAgainstPeers(203000, [2030, 3450, 5200, 2800]);
// Returns: { valid: false, confidence: 10, warning: "Price 58.8x higher..." }

This would have caught the error: £203,000 is 58x higher than peer average → obvious anomaly

3. Structured Data Cross-Reference

// Check HTML against JSON-LD
function crossReferencePrice(htmlPrice, jsonLDPrice) {
  if (!jsonLDPrice) {
    return { confidence: 60, warning: 'No structured data for verification' };
  }

  const difference = Math.abs(htmlPrice - jsonLDPrice);
  const percentDiff = (difference / jsonLDPrice) * 100;

  if (percentDiff > 5) {
    return {
      confidence: 30,
      warning: `HTML price (£${htmlPrice}) conflicts with structured data (£${jsonLDPrice})`
    };
  }

  return { confidence: 95, note: 'Verified across multiple sources' };
}

This would have caught the error: If website had JSON-LD showing £2,030 but HTML parsing returned £203,000, the conflict would be flagged

4. Confidence Scoring

// Aggregate validation results
function calculateConfidence(validations) {
  let confidence = 100;

  validations.forEach(v => {
    if (!v.valid) {
      confidence = Math.min(confidence, v.confidence);
    }
  });

  if (confidence < 50) {
    return {
      action: 'REQUIRE_VERIFICATION',
      message: 'Price data unreliable. Verify at operator website before booking.'
    };
  }

  return { action: 'PROCEED', confidence };
}

This would have prevented output: Confidence score of 10-30% would trigger “require verification” response instead of presenting the price as fact

For Users (Readers of This Book)

1. Spot-Check Critical Data

Always verify:

  • Prices (especially if surprisingly high or low)
  • Dates and times
  • Contact information
  • Legal or financial details

Why this matters: Agents are good at gathering information but inconsistent at validation. Critical decisions require human verification.

2. Cross-Reference

Compare AI-provided information against:

  • Official operator websites
  • Multiple booking platforms
  • Recent reviews or forums

The £203k error would have been caught immediately: A 30-second check of the operator’s website would show £2,030, not £203,000.

3. Question Anomalies

If something seems wrong:

  • Ask the agent to verify
  • Request source information
  • Check multiple sources independently

Trust but verify: The agent is a research assistant, not an authoritative source.

4. Domain Knowledge

Maintain basic awareness of:

  • Typical price ranges for services
  • Industry norms and standards
  • Red flags for errors or scams

You don’t need expertise, just rough ranges: Knowing that river cruises cost £2k-£8k (not £200k) is sufficient to catch this error.


Technical Root Cause

Challenge of Number Parsing

Large language models process text, not structured data. When encountering numbers:

1. No Inherent Concept of Magnitude

  • “203000” and “2030” are just token sequences
  • No built-in understanding that one is 100x the other
  • Models learn statistical patterns, not mathematical properties

2. Format Ambiguity

  • £2,030 vs £2.030 vs £2030.00
  • Different regional conventions (UK vs European vs US)
  • HTML/PDF formatting artifacts
  • Currency symbols, thousands separators, decimal points

3. Context-Dependent Interpretation

  • Same number might mean price, quantity, date, reference number
  • AI must infer meaning from surrounding text
  • Ambiguous HTML structure complicates extraction

4. Transformation Errors Compound

Original source → OCR/parsing → AI interpretation → formatting → output

Each step can introduce errors. No checksums or validation between steps. By the time the error reaches output, it’s treated as validated data.

Why This Specific Error Pattern

The 100x multiplication (adding two zeros) suggests:

  • Decimal point treated as thousands separator
  • Two-stage conversion (format + currency) applied incorrectly
  • Copy-paste error in training data or retrieval source
  • Systematic parsing rule misapplied to European number format

The key insight: This wasn’t a reasoning failure. The agent didn’t think £203,000 was reasonable. It never had the chance to reason about the data because validation layers were absent.


Lessons Learned

1. Data Pipeline Failures, Not Reasoning Failures

The error occurred in extraction and parsing, not in understanding that £203,000 is unreasonable. An AI doing comparative analysis would reject this outlier immediately.

Implication for agent creators: Focus on data quality and validation layers, not just better reasoning models.

2. Isolated Processing Creates Blind Spots

Processing each operator independently without cross-referencing allowed bad data to propagate. Comparative validation would have caught this.

The incomplete data problem: The agent retrieved pricing for only one of three operators. This itself should have triggered a warning: “Unable to provide comparative pricing - only 1 of 3 operators returned price data. Confidence: low. Recommend manual verification.”

Best practice: Never process data in isolation. Always maintain context and comparison points. When comparative data is incomplete, flag it explicitly and reduce confidence scores.

3. Single-Source Extraction is Fragile

Relying on one data source without verification against official operator sites creates vulnerability to parsing errors, formatting issues, or outdated information.

Best practice: Multi-source verification. HTML + JSON-LD + API responses. When sources conflict, flag it.

4. Validation Layers Are Essential

Critical data needs multiple checkpoints:

  • Range validation (is this within expected bounds?)
  • Comparative validation (how does this compare to alternatives?)
  • Source verification (does this match the official site?)
  • Cross-referencing (do multiple sources agree?)

This is the core lesson of Chapter 13: Build these checkpoints into your agent systems. They’re not optional extras; they’re essential guardrails.

5. Format Creates False Confidence

Professional formatting and detailed presentation mask underlying data quality issues. Structure != accuracy.

The danger: Users trust well-formatted output more than plain text, even when data quality is identical.

6. Detection Methods Matter

This error was caught through:

  • Human domain knowledge (cruises don’t cost £200k)
  • Contextual comparison (other operators were £2k-£6k)
  • Questioning anomalies (asking “does this really say £203,000?”)

Not through:

  • AI self-correction
  • Automated validation
  • System warnings

The gap: Humans have these validation mechanisms built in. Agents need them explicitly implemented.

7. Obvious Errors Reveal Systemic Issues

If an obvious 100x error passes through, subtler errors (20% too high, wrong dates, incorrect inclusions) likely pass through unchallenged.

The real concern: The £203k error was caught because it was absurd. How many plausible-but-wrong errors propagate without detection?


Conclusion

This £203,000 pricing error illustrates a different failure mode than initially appears. The error likely occurred during initial data retrieval or parsing - not during decision-making.

The Real Failure Point

An AI agent doing comparison shopping would reject or flag this price as an outlier when comparing:

  • Operator A: £2,000-£4,000
  • Operator B: £3,500-£6,500
  • Operator C: £203,000-£402,000 ← Obvious anomaly
  • Operator D: £2,800-£5,200

The actual failure was probably:

  1. Bad data retrieval - Wrong field extracted from source
  2. No comparative validation - Each operator researched in isolation
  3. No sanity checking - Price accepted without cross-referencing
  4. Single-source reliance - No verification against official operator site

What This Reveals

This wasn’t an AI “not understanding that £203,000 is expensive” - it was a data pipeline failure:

  • Wrong data extracted from source (most likely)
  • Incorrect parsing of number format (possible)
  • No comparative analysis performed (systemic)
  • No validation against known ranges (missing safeguard)

Application to Chapter 13

This case study informed the validation framework presented in Chapter 13:

  • Range validation: £203,000 > £15,000 maximum → flag
  • Comparative analysis: 58x higher than peers → flag
  • Structured data cross-reference: HTML != JSON-LD → flag
  • Confidence scoring: Multiple failures → very low confidence
  • Graceful degradation: Low confidence → require verification

Key Takeaway

The failure wasn’t in AI reasoning about prices, but in data extraction and validation. Even sophisticated AI systems need robust data pipelines with cross-referencing and sanity checks, not just better reasoning capabilities.

For agent creators: Build validation layers. Don’t assume your extraction is correct. Cross-reference. Score confidence. Require verification for low-confidence data. These guardrails are the difference between a reliable agent and one that propagates £203k pricing errors.

For users: Verify critical data. Agents are powerful research tools but inconsistent validators. When making important decisions (bookings, purchases, financial commitments), always check multiple sources.

The error was instructive because it was obviously wrong to human domain knowledge. More concerning are errors that fall within plausible ranges - where comparative analysis and validation become the only defence against propagating incorrect data.


Cross-reference: See Chapter 13 for implementation patterns based on this case study.


Home Top