LW
Back to Projects

Google Hotel Center MCP

A documentation-grounded MCP server that validates Google Hotel Ads XML generation and reduces LLM-generated schema mistakes before they reach production. Rolled out as a permanent validation layer in the engineering team's workflow.

Role: Lead Engineer (Tool Author + Team Rollout)
Period: 2026

Overview

Google Hotel Ads XML generation is specification-heavy: small schema or semantic mistakes can reduce price accuracy and hurt ad exposure.

I built a documentation-grounded MCP server that gives LLM-assisted development workflows a hard validation layer. The server indexes official documentation, validates planned XML elements before generation, checks generated XML against known constraints, and blocks output that cannot be traced back to an indexed source.

The project combined tool development, XML quality investigation, and team rollout. The goal was not just to build a personal helper, but to make reliable XML generation the default workflow for the engineering team.

My Role

As the tool author and lead on this quality-improvement initiative, I:

  • Designed, built, and shipped the entire MCP server — 108 indexed element definitions, 14 message types, 6-stage validation pipeline, provenance system with SHA256 content hashing.
  • Rolled out the tool to the engineering team — integrated it into the team's Claude Code workflow so every XML-touching change passes through the same validation gate, regardless of which engineer authored it.
  • Authored team documentation and worked with engineers on adoption — the goal was not "I have a tool" but "the team has a tool".
  • Used the tool's audit mode and adjacent investigations to uncover and fix several XML feed quality issues across unavailable-room handling, capacity mapping, and market/currency eligibility.
  • Helped support a broader Price Accuracy improvement initiative as part of the team-wide rollout.

Tech Stack

Interface

MCP Protocol (Claude Code / IDE integration)

Backend

Python 3.10+FastMCP (Model Context Protocol)Pydantic 2.0xml.etree.ElementTree

Infrastructure

MCP Stdio TransportEager Initialization (zero I/O per request)SHA256 Provenance Hashing

Indexing

In-Memory Document Index (108 elements)Dual Case Index (exact + case-insensitive)XSD Schema Registry (lxml)

Architecture

The system operates as a stdio-based MCP server that exposes 7 tools to the LLM. The core principle is documentation-only truth: no rule exists unless it can be traced to an indexed official document.

Document Index Layer: At startup, the server eagerly loads element definitions, message types, and official XSD schemas into memory. A dual-case index provides exact match lookups with case-insensitive fallback. This eliminates all per-request I/O.

Validation Pipeline: XML generation follows a strict 6-stage flow:

  1. search_docs — find relevant documentation sections
  2. validate_xml_plan — verify planned elements exist before writing XML
  3. Generate XML (LLM step)
  4. precheck_generated_xml — syntax, parent-child, required attributes
  5. verify_cited_xml — every rule must have a provenance citation
  6. gate_generated_xml — final hard gate, blocks output if any check fails

Cross-Element Constraint Engine: Validates semantic rules that span multiple elements — the kind of constraints XSD cannot express on its own.

Team Integration: Distributed via the team's shared MCP configuration so any engineer's Claude Code session has the validation loop available with zero per-developer setup.

System architecture diagram

Key Challenges

1. LLM Hallucination in Structured Output

LLMs confidently generate plausible-looking XML that contains invented elements or incorrect attribute values. Standard prompt engineering cannot reliably prevent this because the model has no grounding in the actual specification.

2. Cross-Element Semantic Constraints

Some validation rules span multiple elements. XSD schema validation catches structural errors but cannot enforce business logic like "if element A has value X, then element B must have value Y." These silent failures are the hardest to detect.

3. Documentation Drift

Google's Hotel Center documentation updates independently of our system. Rules that were correct last month may be outdated. The system needed a way to detect when its knowledge base was stale.

4. Quality Issues Above the XML Layer

Some accuracy issues do not come from malformed XML at all — they come from upstream pipeline decisions about which markets, currencies, or rooms to emit in the first place. These problems sit one level above XML validation and need a different investigation pattern.

5. Zero-Latency Requirement

The MCP server sits in the LLM's tool-calling loop. Every millisecond of I/O latency compounds across the 6-stage pipeline. Disk reads or network calls per validation request were unacceptable.

6. Team Adoption, Not Just Tool Existence

A correct tool nobody uses is worse than a wrong tool — it creates the illusion of safety. The rollout had to make the validation gate the path of least resistance for the rest of the team, not an extra step they could skip.

Solutions & Design Decisions

Provenance-Based Validation

Every rule in the system carries a provenance record: the source document URL, the section heading, and a SHA256 hash of the content at index time. When the LLM generates XML, verify_cited_xml checks that every referenced rule can be traced back to its source. If the citation chain breaks, the output is blocked. Trade-off: the LLM cannot use knowledge beyond what is indexed, but for spec-grounded generation that is the desired property.

Cross-Element Constraint Definitions

Semantic constraints are defined declaratively and checked during precheck_generated_xml. Constraints that XSD cannot express live as first-class rules in this layer rather than getting buried in code review.

Eager Initialization with Dual-Case Index

All documentation, element definitions, and XSD schemas are loaded into memory at server startup. Lookups use an exact-match dictionary with a case-insensitive fallback dictionary. Result: zero I/O after startup, sub-millisecond lookups.

Content Hash Drift Detection

Each indexed document section stores a SHA256 hash. When the documentation source is re-indexed, changed hashes surface as warnings, alerting the team to review potentially stale rules.

Quality Investigation Beyond the Tool

The MCP catches problems at the XML layer. Some quality issues sit upstream of XML — pipeline-level decisions about what to emit at all. Those required separate investigation through the Price Accuracy dashboard and feed configuration. The lesson baked into the rollout: the tool is one of several quality controls, not the only one.

Team Rollout via Shared MCP Configuration

Distributing the server through the team's shared MCP config means a new engineer gets the validation gate automatically — no install steps, no opt-in. Combined with team documentation and pair-programming sessions during the first week, adoption was effectively immediate rather than gradual.

Results & Impact

Team-Wide Quality Improvement

  • Rolled out the MCP server to the engineering team's Claude Code workflow
  • Indexed 108 XML element definitions and 14 message types
  • Built a 6-stage validation pipeline for planning, generation, precheck, citation verification, and final gating
  • Improved XML feed quality by catching schema and cross-element issues before production
  • Helped support a broader Price Accuracy improvement initiative

Investigation as a By-Product

The rollout also helped uncover multiple XML feed quality issues across unavailable-room handling, capacity mapping, and market/currency eligibility. Rather than treating the MCP server as a one-off debugging tool, I positioned it as a permanent validation layer in the team's Hotel Ads XML workflow — so the same class of issues is significantly less likely to recur.

Validation Coverage

  • 108 XML element definitions indexed
  • 14 message types covered
  • 6-stage pipeline catches errors at every step
  • Permanent presence in the team's IDE-level workflow

Learnings

Ground Truth Beats Prompt Engineering

No amount of system prompt tuning can match a hard validation gate backed by indexed documentation. When correctness matters, give the LLM tools that enforce constraints rather than instructions that suggest them.

Cross-Element Rules Are Where the Bugs Hide

The most impactful issues found by the tool were not typos or missing elements — they were semantic relationships between elements that only manifest in specific states. These constraints are invisible to schema validation and easy to miss in code review.

A Tool Is Not a Strategy

The MCP only catches problems at the XML layer. Some quality wins required investigation higher up the pipeline. The takeaway: when a metric is stuck, look at every layer of the system, not just the one you have a tool for.

Roll Out, Don't Just Build

The biggest multiplier on this project was not the tool itself — it was making the tool the path of least resistance for everyone else. Distributing through shared MCP config, writing onboarding docs, and pair-programming the first week's adoption did more for production XML quality than the validation pipeline did in isolation.

Eager Loading Is Worth the Startup Cost

For a tool that runs in a tight loop, paying the cost once at startup and having zero per-request I/O is the right trade-off. The server starts in under a second and every subsequent call is pure computation.

Deep Dive: The Validation Pipeline

Why This Matters

Google Hotel Ads uses Price Accuracy to rank hotel listings. A single invalid XML field — invisible to the human eye — can silently degrade your accuracy score, reduce ad exposure, and cost real revenue. This tool gives the team a structural way to keep generated XML compliant. The rollout to the team and broader investigations across the feed pipeline were equally important.

The 6-Stage Validation Flow

The pipeline enforces a strict sequence. Each stage must pass before the next begins. The LLM cannot skip stages or reorder them.

  1. search_docs — Retrieve relevant documentation sections for the task
  2. validate_xml_plan — Check that all planned element names exist in the index
  3. Generate XML — The LLM produces the XML (the only uncontrolled step)
  4. precheck_generated_xml — Parse, validate structure, check cross-element constraints
  5. verify_cited_xml — Confirm every rule has a traceable provenance citation
  6. gate_generated_xml — Hard gate: block output if any prior check failed

Technology Choices

  • Core: Python 3.10+, FastMCP, Pydantic 2.0
  • Validation: lxml (XSD), xml.etree.ElementTree, SHA256 Provenance
  • Indexing: Dual-Case Dictionary, Eager Init, 108 Element Definitions
  • Protocol: MCP Stdio Transport, 7 Tool Endpoints, 14 Message Types

Coverage at a Glance

  • Elements Indexed: 108 (14 message types)
  • Pipeline Stages: 6 (planning → generation → precheck → citation → gate)
  • Team Reach: integrated into the engineering team's Claude Code workflow — not just one engineer's local setup

Provenance System Design

Every rule in the index carries three pieces of provenance metadata:

@dataclass
class ProvenanceRecord:
    source_url: str          # Official doc URL
    section_heading: str     # Exact section in the document
    content_hash: str        # SHA256 of the content at index time

When verify_cited_xml runs, it walks the generated XML tree and confirms that every element name, attribute, and parent-child relationship can be traced to a ProvenanceRecord. If any rule lacks a citation, the output is blocked at the gate stage.

Design Decision: Why SHA256 Hashes?

Content hashes serve dual purpose. First, they enable drift detection — if a re-index produces a different hash for the same section, the documentation has changed and rules need review. Second, they make provenance tamper-evident. You cannot claim a rule exists if the hash does not match.

Dual-Case Index Strategy

The index maintains two dictionaries: one for exact-match lookups and one for case-insensitive fallback. This handles the common case where developers type baserate instead of Baserate without sacrificing precision for exact queries.

# Lookup with fallback
def lookup_element(name: str) -> Optional[ElementDef]:
    # Stage 1: Exact match (O(1))
    if name in exact_index:
        return exact_index[name]
    # Stage 2: Case-insensitive fallback (O(1))
    normalized = name.lower()
    if normalized in ci_index:
        return ci_index[normalized]
    return None  # Element does not exist in documentation

Why This Project Matters

The MCP server is not a one-off rescue tool — it is a permanent part of the team's Hotel Ads XML workflow. Every future change to the generation logic, by any engineer, passes through the same provenance-backed rule engine, so the class of issues that previously required ad-hoc debugging is significantly less likely to recur. The ongoing value is structural: the system cannot silently drift, and the quality bar is now a team-wide standard rather than a one-engineer ritual.

Note: This case study describes the engineering approach and public-safe architectural decisions. Internal identifiers, business rules, proprietary implementation details, and sensitive operational data have been omitted or generalized.

This case study describes the engineering approach and public-safe architectural decisions. Internal identifiers, business rules, proprietary implementation details, and sensitive operational data have been omitted or generalized.