Overview
Google Hotel Ads determines ad exposure through a "Price Accuracy" score. Google sends PriceQuery and LivePriceQuery requests, and our backend returns XML responses describing room rates, taxes, and availability. If the XML deviates from Google's specification even slightly, the accuracy score drops, ad exposure decreases, and revenue falls.
The Problem: Our Price Accuracy was stuck at the worst possible rating. The team was generating XML responses using LLM-assisted workflows, but the LLM would hallucinate element names, invent attributes, or miss cross-element constraints that Google's system silently rejects.
The Solution: An MCP server that acts as a strict documentation-grounded validation layer. Every XML rule must be traceable to an official Google document section. If a rule cannot be cited, it does not exist.
My Role
As the sole developer, I designed, built, and deployed the entire MCP server, then used it as the audit tool that uncovered two existing production bugs:
- Indexed all 108 XML element definitions and 14 message types from official Google Hotel Center documentation
- Designed the 6-stage validation pipeline architecture
- Implemented the provenance system with SHA256 content hashing
- Used the tool's cross-element constraint engine to uncover two long-running production bugs — one in the no-inventory response path (
<Unavailable>missingTax=0/OtherFees=0), one in the has-inventory path (<Capacity>derived from minimum occupancy instead of the room'socu_max) - Shipped both fixes; monitored the direct lift in Google's Price Accuracy score
- Integrated the tool into the team's Claude workflow so future XML generation is grounded in indexed docs
Tech Stack
Interface
Backend
Infrastructure
Indexing
Architecture
The system operates as a stdio-based MCP server that exposes 7 tools to the LLM. The core principle is documentation-only truth: no rule exists unless it can be traced to an indexed official document.
Document Index Layer: At startup, the server eagerly loads all 108 element definitions, 14 message types, and official XSD schemas into memory. A dual-case index provides exact match lookups with case-insensitive fallback. This eliminates all per-request I/O.
Validation Pipeline: XML generation follows a strict 6-stage flow:
search_docs-- find relevant documentation sectionsvalidate_xml_plan-- verify planned elements exist before writing XML- Generate XML (LLM step)
precheck_generated_xml-- syntax, parent-child, required attributesverify_cited_xml-- every rule must have a provenance citationgate_generated_xml-- final hard gate, blocks output if any check fails
Cross-Element Constraint Engine: Validates semantic rules that span multiple elements — the kind of constraints XSD cannot express. Examples the engine caught in actual production code:
- When
Baserateis-1(unavailable),TaxandOtherFeesmust both be0. <Capacity>must reflect the room's maximum occupancy, not whatever happens to be the minimum in surviving rate bundles.
Key Challenges
1. LLM Hallucination in Structured Output
LLMs confidently generate plausible-looking XML that contains invented elements or incorrect attribute values. Standard prompt engineering cannot reliably prevent this because the model has no grounding in the actual specification.
2. Cross-Element Semantic Constraints
Some validation rules span multiple elements. XSD schema validation catches structural errors but cannot enforce business logic like "if element A has value X, then element B must have value Y." These silent failures are the hardest to detect.
3. Documentation Drift
Google's Hotel Center documentation updates independently of our system. Rules that were correct last month may be outdated. The system needed a way to detect when its knowledge base was stale.
4. Zero-Latency Requirement
The MCP server sits in the LLM's tool-calling loop. Every millisecond of I/O latency compounds across the 6-stage pipeline. Disk reads or network calls per validation request were unacceptable.
Solutions & Design Decisions
Provenance-Based Anti-Hallucination
Every rule in the system carries a provenance record: the source document URL, the section heading, and a SHA256 hash of the content at index time. When the LLM generates XML, verify_cited_xml checks that every referenced rule can be traced back to its source. If the citation chain breaks, the output is blocked. Trade-off: the LLM cannot use knowledge beyond what is indexed, but this is a feature, not a bug.
Cross-Element Constraint Definitions
Semantic constraints are defined declaratively and checked during precheck_generated_xml. The critical constraint -- Baserate=-1 implies Tax=0 and OtherFees=0 -- is hardcoded as a first-class rule. This constraint cannot be expressed in XSD, making it invisible to schema validation alone.
Eager Initialization with Dual-Case Index
All documentation, element definitions, and XSD schemas are loaded into memory at server startup. Lookups use an exact-match dictionary with a case-insensitive fallback dictionary. Result: zero I/O after startup, sub-millisecond lookups.
Content Hash Drift Detection
Each indexed document section stores a SHA256 hash. When the documentation source is re-indexed, changed hashes surface as warnings, alerting the team to review potentially stale rules.
Results & Impact
Two Existing Production Bugs Uncovered
The MCP was built to prevent future hallucinations, but its cross-element rule engine turned out to be an equally useful audit tool for existing XML generation code. Running the indexed rules against production output surfaced two long-standing bugs:
- No-inventory path:
<Unavailable>responses were being emitted without the required sibling<Tax>0</Tax>and<OtherFees>0</OtherFees>. Google silently counted every one as a price mismatch. Fix: added the zero-value siblings and consolidated the sharedPriceQuery/LivePriceQueryremoval-payload logic into a single helper (UnavailableResultSupport). - Has-inventory path: the
<Capacity>element inside<RoomData>was being derived from the minimum occupancy observed in surviving rate bundles — an implementation accident. Google's spec defines<Capacity>as the room's maximum occupancy. Fix: a newocuMaxfield onRoomType, populated from theocu_maxcolumn of the room table, and wired throughRoomBundleFilterServiceso the emitted capacity matches the room's real maximum.
Measurable Impact
Google's Price Accuracy score (internal grading: 不承認 → とても悪い → 要注意 → 適正 → 非常に良い) climbed from the "とても悪い" band, plateaued at "適正", and now trends toward "非常に良い" — a direct, trackable effect on Google Hotel Ads exposure.
Validation Coverage
- 108 XML element definitions indexed
- 14 message types covered
- 6-stage pipeline catches errors at every step
- Zero hallucinated elements reach production since deployment
Learnings
Ground Truth Beats Prompt Engineering
No amount of system prompt tuning can match a hard validation gate backed by indexed documentation. When correctness matters, give the LLM tools that enforce constraints rather than instructions that suggest them.
Cross-Element Rules Are Where the Bugs Hide
The most impactful bug was not a typo or a missing element. It was a semantic relationship between three elements that only manifests in a specific state. These constraints are invisible to schema validation and easy to miss in code review.
Eager Loading Is Worth the Startup Cost
For a tool that runs in a tight loop, paying the cost once at startup and having zero per-request I/O is the right trade-off. The server starts in under a second and every subsequent call is pure computation.
Deep Dive: The Anti-Hallucination Pipeline
Why This Matters
Google Hotel Ads uses Price Accuracy to rank hotel listings. A single invalid XML field -- invisible to the human eye -- can silently degrade your accuracy score, reduce ad exposure, and cost real revenue. This tool ensures every XML response is provably correct.
The 6-Stage Validation Flow
The pipeline enforces a strict sequence. Each stage must pass before the next begins. The LLM cannot skip stages or reorder them.
- search_docs -- Retrieve relevant documentation sections for the task
- validate_xml_plan -- Check that all planned element names exist in the index
- Generate XML -- The LLM produces the XML (the only uncontrolled step)
- precheck_generated_xml -- Parse, validate structure, check cross-element constraints
- verify_cited_xml -- Confirm every rule has a traceable provenance citation
- gate_generated_xml -- Hard gate: block output if any prior check failed
Technology Choices
- Core: Python 3.10+, FastMCP, Pydantic 2.0
- Validation: lxml (XSD), xml.etree.ElementTree, SHA256 Provenance
- Indexing: Dual-Case Dictionary, Eager Init, 108 Element Definitions
- Protocol: MCP Stdio Transport, 7 Tool Endpoints, 14 Message Types
The Bugs the Tool Found in the Existing Codebase
The MCP was originally a preventative tool — a guardrail so that LLM-generated XML couldn't ship with hallucinated elements. But once the indexed rule set was in place, it was trivial to point it at existing production XML output and ask: does this code already comply with every rule? The answer turned out to be "no" in two different ways.
Bug A · No-Inventory Path: `<Unavailable>` missing required zero-value siblings
When a room is unavailable, the handler returned only <Unavailable> (or Baserate=-1 with no tax fields). Google's spec requires <Tax>0</Tax> and <OtherFees>0</OtherFees> siblings in the same Result to unambiguously represent "no inventory, no charge". Without them, Google silently counts every response as a price discrepancy — no error log, no warning, just a downward drift on the Price Accuracy score.
Fix: Added the zero-value siblings. Also pulled the shared PriceQuery / LivePriceQuery removal-payload generation into a single UnavailableResultSupport helper so the two message types can't drift again.
Bug B · Has-Inventory Path: `<Capacity>` was min-occupancy instead of max
The has-inventory path's <Capacity> element inside <RoomData> was being computed from the minimum occupancy observed in the surviving rate bundles — which happened to work for some rooms and was completely wrong for others. Google defines <Capacity> as the room's maximum occupancy (a Deluxe room that sleeps 4 should report Capacity=4, not Capacity=1 because the cheapest bundle happened to be a single-occupancy rate). The effect: Google's system would assume the room didn't fit the search party and would refuse to show it for larger-guest queries.
Fix: Added an ocuMax field on the RoomType model, populated from the ocu_max column of the room type table, and rewired RoomBundleFilterService.setMinOccupancyCapacity so the emitted capacity is the room's actual max — not a byproduct of filtering.
Neither bug was visible in XSD validation. Both only manifest as cross-element or source-of-truth constraints — exactly the class of rule that's hard to spot in code review but trivial to express in the MCP's rule engine.
Measured Impact

Google Hotel Center · Landing page price accuracy (Jan–Apr). Vertical grading: 不承認 · とても悪い · 要注意 · 適正 · 非常に良い. Score was stuck in the "とても悪い" band before the MCP-identified fixes shipped; after both fixes landed, the line climbed to "適正" and now trends toward "非常に良い".
Coverage at a Glance
- Elements Indexed: 108 (14 message types)
- Pipeline Stages: 6 (zero hallucinations reach production)
- Bugs Found in Existing Code: 2 (both shipped, both verified by the accuracy score climbing)
- Accuracy Score:
とても悪い→適正→ trending非常に良い
Provenance System Design
Every rule in the index carries three pieces of provenance metadata:
@dataclass
class ProvenanceRecord:
source_url: str # Official doc URL
section_heading: str # Exact section in the document
content_hash: str # SHA256 of the content at index time
When verify_cited_xml runs, it walks the generated XML tree and confirms that every element name, attribute, and parent-child relationship can be traced to a ProvenanceRecord. If any rule lacks a citation, the output is blocked at the gate stage.
Design Decision: Why SHA256 Hashes?
Content hashes serve dual purpose. First, they enable drift detection -- if a re-index produces a different hash for the same section, the documentation has changed and rules need review. Second, they make provenance tamper-evident. You cannot claim a rule exists if the hash does not match.
Dual-Case Index Strategy
The index maintains two dictionaries: one for exact-match lookups and one for case-insensitive fallback. This handles the common case where developers type baserate instead of Baserate without sacrificing precision for exact queries.
# Lookup with fallback
def lookup_element(name: str) -> Optional[ElementDef]:
# Stage 1: Exact match (O(1))
if name in exact_index:
return exact_index[name]
# Stage 2: Case-insensitive fallback (O(1))
normalized = name.lower()
if normalized in ci_index:
return ci_index[normalized]
return None # Element does not exist in documentation
Why This Project Matters
The MCP server is not a one-off rescue tool — it is now a permanent part of the Hotel Ads XML workflow. Every future change to the generation logic passes through the same provenance-backed rule engine, so the class of bugs that depressed Price Accuracy for months is structurally impossible to reintroduce. The two historical fixes were the initial payoff; the ongoing value is that the system cannot silently drift again.