Back to Projects

Hotel Dynamic Pricing Platform

An automated hotel dynamic pricing platform that scrapes competitor prices, calculates optimal discounts, and publishes pricing rules to our internal booking system — replacing a manual workflow with a fully autonomous pipeline.

Role: Solo Developer (Full Ownership)
Period: 2024 - 2025

Overview

The Hotel Dynamic Pricing Platform is a monorepo system that automates competitive hotel pricing — a process that was previously done manually by operations staff. The system continuously scrapes competitor prices from Google Travel, generates pricing tasks, calculates optimal discount percentages, and publishes them to our internal Hotel Pricing Gateway — the central API that controls the prices shown to customers on our booking site.

Target Users: The hotel operations and revenue management team at my company.

Problem Solved: Operations staff manually compared competitor prices across thousands of properties and adjusted discounts one by one — a process that took hours and was always outdated by the time it finished. This system runs two daily pricing cycles, each consisting of one scrape run, one planner run, and two staggered publish runs, keeping pricing adjustments responsive throughout the day with no manual intervention.

My Role

As the sole developer, I owned the entire system end-to-end:

  • Designed and built the Python scraper service (FastAPI + Playwright)
  • Built the Java backend with Spring Boot for task planning and price publishing
  • Created the React admin UI for monitoring and manual overrides
  • Designed the database schema and batch processing pipeline
  • Authored the full Kubernetes/Kustomize manifest set — base + dev overlays, CronJob manifests for planner and publisher, Gateway API HTTPRoute and BackendPolicy patches
  • Extended the shared Terraform infra to provision the service end-to-end: a new GCP Service Account with Workload Identity binding to its K8s SA, Cloud SQL IAM user with ADMIN grant on the service's database, and Secret Manager entries for DB credentials
  • Wired the CI/CD flow through ArgoCD
  • Operated and maintained the system in production

Tech Stack

Frontend

React 19ViteTypeScriptTailwind CSS

Backend

Java 21Spring Boot 3.4Python 3.12FastAPIPlaywrightSSE (real-time log streaming)

Infrastructure

DockerKubernetes / GKE (Kustomize base + overlays)ArgoCD (GitOps delivery)CronJob SchedulingTerraform (GCP Service Account · Workload Identity · Cloud SQL IAM · Secret Manager)Gateway API (HTTPRoute · BackendPolicy · HealthCheckPolicy)

Database / Messaging

MySQL 8Redis

Architecture

The system is a monorepo with three distinct services, orchestrated by Kubernetes CronJobs:

Scraper Service (Python/FastAPI/Playwright): Headless browser automation that fetches competitor prices from Google Travel via the internal batchexecute API. Uses Poisson jitter rate limiting to avoid detection and inline parsing to distribute database write load.

Backend Service (Java/Spring Boot): Two core batch jobs — the Planner Job generates pricing tasks using bulk INSERT…SELECT that completes the insert in seconds regardless of run size, and the Publisher Job compares prices, calculates discount percentages, and pushes results to the Hotel Pricing Gateway's batch endpoint.

Admin UI (React/Vite): Real-time monitoring dashboard with SSE log streaming across Kubernetes pods. Provides manual override controls for edge cases the automation cannot handle.

CronJob Orchestration: The system runs two daily pricing cycles. Each cycle executes Scrape once, Plan once, and two staggered Publish runs. Each stage can also be triggered independently for debugging.

System architecture diagram

Key Challenges

1. Wasted Work on Dates Nobody Books

Discounting every scraped property × date × channel combination is wasteful. Most price points represent dates that no customer ever searches for — spending CPU, network, and partner rate limits on them produces no revenue. Worse, actively discounting cold dates lowers margin on inventory that would have sold anyway.

2. Expensive Self-Price Lookups

To compare against our own price, the naive approach is to proactively query our internal booking platform for every property × date. But those queries often hop through a wholesaler API — each call has real latency and cost, and fanning out for 1.5M UPSERTs per day would overload the platform.

3. Publishing Performance at Scale

The initial implementation published discounts one property at a time via the gateway's per-property API. With hundreds of thousands of individual API calls required per cycle, this was unacceptably slow — a single cycle could not complete before the next one started.

4. Concurrent Write Contention

The scraper writes price data while the planner reads it. Traditional offset-based pagination broke under concurrent writes — rows were skipped or duplicated as new data shifted page boundaries.

5. Cross-Pod Log Visibility

In a Kubernetes environment with multiple pods, operations staff could not see real-time logs from the correct pod. They needed to know immediately if a scraping or publishing job was failing.

6. Idempotency Under Failure

Network failures mid-batch could leave the system in an inconsistent state — some properties published with new prices, others stuck on stale prices. Retries risked duplicate API calls to the Hotel Pricing Gateway.

Solutions & Design Decisions

Demand-Driven Price Snapshots (Reverse-Driven Architecture)

Instead of building a proactive pipeline that pulls our own prices for every combination, I added a lightweight recording hook on the existing Hotel Pricing Gateway. Whenever Google Travel queries us for a price (which it does naturally for every hotel a user views), the gateway logs the property, date, channel, and the price it returned into a cheapest_price snapshot table. The Publisher Job then reads from this table instead of actively querying our platform.

The insight: Google's inbound query traffic is the demand signal. If no user ever searches a particular date, Google never asks about it, no snapshot gets recorded, and we never waste effort discounting it. The system automatically focuses pricing attention on the dates that actually matter.

Benefits:

  • Zero extra API load on our booking platform or wholesaler integrations
  • Natural demand filtering — we only act on dates with observed search interest
  • Freshness is automatic — snapshots refresh whenever Google re-queries, so stale prices self-correct
  • No polling schedules or cache invalidation logic to maintain

Batch API Integration (Batch Request Consolidation)

Replaced individual API calls with the Hotel Pricing Gateway's batch endpoint. Instead of hundreds of thousands of sequential HTTP calls, the system now groups properties into batch payloads (up to 500 items per call) — reducing API requests by roughly 500× and cutting the end-to-end publish cycle from hours to minutes.

Two-Pointer Pagination Algorithm

Designed a cursor-based pagination approach using two pointers (last-seen ID + created timestamp) instead of OFFSET/LIMIT. This guarantees stable iteration even when the scraper is actively inserting rows into the same table. The algorithm never skips or double-processes records regardless of concurrent write volume.

SSE Log Streaming with Three-Layer Fallback

Built a real-time log pipeline: Logback Appender captures log events into local memory, which syncs to Redis Pub/Sub, which feeds SseEmitter endpoints. The three layers (local memory, Redis, SSE) provide graceful degradation — if Redis is down, local logs are still available; if a pod restarts, Redis retains recent history.

Redis + DB Dual-Layer Idempotency

Each batch operation generates a deterministic idempotency key. Redis provides fast deduplication for the common case (TTL-based), while the database serves as the durable fallback. This ensures exactly-once semantics for the gateway API calls even across pod restarts.

Chunk-Isolated Transactions

Each batch of 100 properties runs in its own transaction (REQUIRES_NEW propagation). If one chunk fails, it does not roll back the entire batch — only the affected 100 properties are retried. This dramatically reduces blast radius and improves recovery time.

Results & Impact

Performance

  • Publishing: ~500× fewer API requests via batch endpoint (up to 500 items per call); end-to-end publish cycle dropped from hours to minutes
  • Planner Job: processes 500K+ tasks per run in ~20 minutes; bulk INSERT…SELECT moves millions of rows
  • Steady daily cadence: 8 scheduled runs (scraper 2×, planner 2×, publisher 4×) with 100% success in the observed week

Operational Impact

  • Eliminated manual price comparison workflow entirely
  • Pricing adjustments now reflect competitor changes within hours, not days
  • Operations staff reassigned from manual pricing to higher-value work

Reliability

  • Zero data loss from mid-batch failures (dual-layer idempotency)
  • Chunk isolation limits blast radius to 100 properties per failure
  • SSE log streaming provides real-time visibility across all pods

Learnings

Batch APIs Change Everything

The single biggest performance win was not clever code — it was switching from individual to batch API calls. Before optimizing algorithms, always check whether the external service offers a bulk endpoint. Consolidating individual calls into batch requests dwarfed every other optimization combined.

Design for Concurrent Writes from Day One

Offset-based pagination is a trap when tables are under active write load. The two-pointer approach costs almost nothing to implement upfront but saves enormous debugging pain later. I now default to cursor-based pagination for any table with concurrent writers.

Chunk Isolation is Worth the Complexity

REQUIRES_NEW transactions per chunk add complexity, but the operational benefit is massive. Being able to tell an operator "only 100 properties need re-processing instead of the full 500K" transforms incident response from panic to routine.

Inline Parsing Distributes Load Better Than Batch Parsing

Initially, the scraper collected all raw HTML and parsed it in a separate batch step. Switching to inline parsing (parse immediately after each fetch) distributed the database write load more evenly across the scraping window, eliminating the spike that caused lock contention.

Deep Dive: System Design Decisions

Key Insight: Use the Demand Signal You Already Have

The single most impactful architectural decision in this system wasn't about scale, batching, or concurrency — it was about where the "our price" baseline comes from.

A naive design would spin up a second pipeline that proactively queries our booking platform for every property × date combination. That pipeline would be expensive (wholesaler API hops), fragile (cache invalidation, schedule drift), and wasteful (most dates never get booked).

Instead, I realized Google Travel is already querying us for prices every time a user views a hotel. That query traffic is a real-time demand signal. By adding a small recording hook on the Hotel Pricing Gateway, every inbound query becomes a snapshot in the cheapest_price table — no extra load, no cold dates, no polling.

This flips the architecture from pull-based (we actively query our own prices) to push-based (demand triggers recording). It is the reason the pricing pipeline can run on modest resources despite covering tens of thousands of properties.

Why a Monorepo?

Python excels at browser automation (Playwright) and rapid prototyping for scraping logic. Java excels at reliable batch processing with strong typing and transaction management. React provides the monitoring interface. A monorepo keeps deployment coordination simple — one commit can update the scraper parsing logic and the planner's task generation in lockstep.

Running 7.7 GB of Live Data on a Micro Instance

The entire production database runs on a Cloud SQL Micro tier — 640 MB InnoDB buffer pool, the cheapest managed tier available. Despite the modest hardware, the system processes 1.5M UPSERTs per day and handles peak runs of 500K+ rows.

Every design decision in this project exists to make that possible:

  • Chunked transactions (100 properties per chunk) keep working sets well under buffer pool size
  • Two-pointer pagination streams rows instead of loading the full result set
  • Bulk INSERT…SELECT avoids shuttling millions of rows through the application layer
  • 48-hour TTL on fetch_raw prevents debug data from ballooning the DB
  • Demand-driven snapshots remove an entire subsystem that would otherwise exist
  • Batch endpoint publishing means ~500× fewer TCP connections per run

Running this workload on a larger instance would be the easy fix. Running it on Micro is what forced the architecture to stay lean.

Scale in Production

  • Total dataset: 7.7 GB — 3.47M competitor prices + 2.62M pricing tasks
  • Daily write volume: ~1.5M UPSERTs into hotel_prices, ~1M into mc_planner_tasks
  • Peak single run: 500,000+ tasks processed by a single planner invocation
  • Daily cadence: 2 scrape runs/day (~50 min per run), 2 planner runs/day (~20 min per run), 4 publish runs/day
  • Infrastructure: Cloud SQL MySQL 8.4 · Micro tier · 640 MB InnoDB buffer pool
  • Publish efficiency: ~500× fewer API requests via batch endpoint integration (up to 500 items per call)

Two-Pointer Pagination

The core challenge: the scraper is continuously inserting rows while the planner reads from the same table. Traditional OFFSET/LIMIT shifts row positions on every insert, causing the planner to skip or double-count records.

Cursor-Based Pagination (Two-Pointer Approach)
-- Instead of OFFSET/LIMIT which breaks under concurrent writes:
-- SELECT * FROM prices ORDER BY id LIMIT 1000 OFFSET 5000  -- UNRELIABLE

-- Two-pointer approach: stable regardless of concurrent inserts
SELECT * FROM prices
WHERE (created_at > :lastTimestamp)
   OR (created_at = :lastTimestamp AND id > :lastId)
ORDER BY created_at ASC, id ASC
LIMIT 1000

Why Two Pointers Instead of One?

Using only id > :lastId works for auto-increment but fails if IDs have gaps or are not sequential. Using only created_at > :lastTimestamp fails when multiple rows share the same timestamp. The combination of both fields guarantees a unique, stable sort order — the timestamp provides coarse ordering, and the ID breaks ties.

Chunk-Isolated Transaction Pattern

REQUIRES_NEW Per Chunk (Spring)
// Outer method: iterates over chunks, continues on failure
@Transactional
public void publishAll(List<Property> properties) {
    Lists.partition(properties, CHUNK_SIZE).forEach(chunk -> {
        try {
            chunkPublisher.publishChunk(chunk); // REQUIRES_NEW
        } catch (Exception e) {
            log.error("Chunk failed, {} properties affected", chunk.size(), e);
            // Other chunks continue — blast radius is limited
        }
    });
}

// Inner method: independent transaction per chunk
@Transactional(propagation = Propagation.REQUIRES_NEW)
public void publishChunk(List<Property> chunk) {
    // If this fails, only this chunk rolls back
    batchApi.publish(chunk);
    markAsPublished(chunk);
}

SSE Log Streaming Architecture

Real-time log visibility across Kubernetes pods is essential for operations. The three-layer design ensures logs are always accessible, even during partial infrastructure failures.

K8s Pod Lifecycle Awareness

SSE connections are long-lived, but pods are ephemeral. When a pod scales down or restarts, all SSE connections to that pod drop. The Redis layer acts as the bridge — any pod can serve historical logs from Redis, and new connections automatically pick up the stream regardless of which pod generated the original log entries.

Poisson Jitter Rate Limiting

Scraper Rate Limiting Strategy
import random
import math

def poisson_delay(avg_interval_sec: float) -> float:
    """
    Generate delay using Poisson process (exponential distribution).
    Unlike fixed delays, this produces natural-looking request patterns
    that are harder for anti-bot systems to detect.
    """
    return -math.log(1.0 - random.random()) * avg_interval_sec

Why Poisson Instead of Random Uniform?

Fixed intervals with random jitter (e.g., 5s plus or minus 2s) still produce a detectable pattern — the mean is constant and the distribution is uniform. Poisson-distributed intervals mimic natural human browsing behavior: mostly short gaps with occasional longer pauses. This significantly reduces the likelihood of being flagged by rate limiting systems.

Dual-Layer Idempotency

Redis + DB Idempotency Pattern
public boolean tryAcquire(String idempotencyKey) {
    // Layer 1: Redis (fast path, TTL-based expiry)
    Boolean acquired = redis.opsForValue()
        .setIfAbsent(key, "1", Duration.ofHours(24));
    if (Boolean.TRUE.equals(acquired)) {
        return true;  // First attempt — proceed
    }

    // Layer 2: DB — authoritative check if Redis reports the key already exists
    return idempotencyRepository.insertIgnore(key);
    // INSERT IGNORE ensures exactly-once even across pod restarts or Redis evictions
}

Why Two Layers?

Redis alone is not durable — keys can be lost during failover or memory pressure. The database alone is too slow for the hot path (hundreds of thousands of checks per cycle). The dual-layer approach gives us the speed of Redis for the 99.9% case and the durability of MySQL for the edge cases that matter most.

Live Demo

Below is a fully interactive mock of the admin UI. All data shown is fictional for demonstration purposes.

Dynamic Pricing Admin UIMock Data
Open in new tab
Navigate between Dashboard, Crawler, Planner, Pricing Rules, and Settings tabs. All operations run against an in-memory mock, no backend required.

What You Can Try

  • Dashboard — compare scraped competitor prices against our price snapshots across check-in dates
  • Crawler — browse Seeds, Google Travel Tokens, UA Profiles, Cookies, Proxies, and Fetch Configs
  • Planner — inspect generated pricing tasks with status filtering (QUEUED / SUCCESS / FAILED)
  • Pricing Rules — view discount code mappings and per-channel strategy extras
  • Settings — review app-level configuration keys