Hotel Dynamic Pricing Platform

Overview

Hotel pricing changes frequently, but manual price checks and rule updates are slow, error-prone, and difficult to scale.

I built a dynamic pricing platform that collects external market price signals on a scheduled basis, normalizes them into comparable room-level data, calculates discount rules, and distributes pricing updates to the internal booking system.

The system was designed for operational reliability: scheduled batch execution, idempotent updates, retry-safe API calls, observability, and clear rollback paths. It runs two daily pricing cycles — each consisting of one collection run, one planner run, and two staggered publish runs — keeping pricing adjustments responsive throughout the day with no manual intervention.

Live Demo

Below is a fully interactive mock of the admin UI. All data shown is fictional for demonstration purposes.

Dynamic Pricing Admin UIMock Data

Open in new tab

Navigate between Dashboard, Collector, Planner, Pricing Rules, and Settings tabs. All operations run against an in-memory mock, no backend required.

What You Can Try

Dashboard — compare collected market price signals against our price snapshots across check-in dates
Collector — browse Seeds, request templates, UA Profiles, Cookies, Proxies, and Fetch Configs
Planner — inspect generated pricing tasks with status filtering (QUEUED / SUCCESS / FAILED)
Pricing Rules — view discount code mappings and per-channel strategy extras
Settings — review app-level configuration keys

My Role

As the lead engineer with full ownership, I:

Designed and built the Python collector service (FastAPI + Playwright) for scheduled, rate-limited external price-signal collection
Built the Java backend with Spring Boot for task planning and price publishing
Created the React admin UI for monitoring and manual overrides
Designed the database schema and batch processing pipeline
Authored the full Kubernetes/Kustomize manifest set — base + dev overlays, CronJob manifests for planner and publisher, Gateway API HTTPRoute and BackendPolicy patches
Extended the shared Terraform infra to provision the service end-to-end: a new GCP Service Account with Workload Identity binding to its K8s SA, Cloud SQL IAM user with ADMIN grant on the service's database, and Secret Manager entries for DB credentials
Wired the CI/CD flow through ArgoCD
Operated and maintained the system in production

Tech Stack

Frontend

React 19ViteTypeScriptTailwind CSS

Backend

Java 21Spring Boot 3.4Python 3.12FastAPIPlaywrightSSE (real-time log streaming)

Infrastructure

DockerKubernetes / GKE (Kustomize base + overlays)ArgoCD (GitOps delivery)CronJob SchedulingTerraform (GCP Service Account · Workload Identity · Cloud SQL IAM · Secret Manager)Gateway API (HTTPRoute · BackendPolicy · HealthCheckPolicy)

Database / Messaging

MySQL 8Redis

Architecture

The system is a monorepo with three distinct services, orchestrated by Kubernetes CronJobs:

Collector Service (Python/FastAPI/Playwright): Headless browser automation that collects external market price signals on a scheduled cadence. Includes operational safeguards — rate limiting, retry control, randomized scheduling — and inline parsing to distribute database write load across the collection window.

Backend Service (Java/Spring Boot): Two core batch jobs — the Planner Job generates pricing tasks using bulk INSERT…SELECT that completes the insert in seconds regardless of run size, and the Publisher Job compares prices, calculates discount percentages, and pushes results to the internal pricing gateway's batch endpoint.

Admin UI (React/Vite): Real-time monitoring dashboard with SSE log streaming across Kubernetes pods. Provides manual override controls for edge cases the automation cannot handle.

CronJob Orchestration: The system runs two daily pricing cycles. Each cycle executes Collect once, Plan once, and two staggered Publish runs. Each stage can also be triggered independently for debugging.

Key Challenges

1. Wasted Work on Dates Nobody Books

Discounting every collected property × date × channel combination is wasteful. Most price points represent dates that no customer ever searches for — spending CPU, network, and partner rate limits on them produces no revenue. Worse, actively discounting cold dates lowers margin on inventory that would have sold anyway.

2. Expensive Self-Price Lookups

To compare against our own price, the naive approach is to proactively query our internal booking platform for every property × date. But those queries often hop through a wholesaler API — each call has real latency and cost, and fanning out for 1.5M UPSERTs per day would overload the platform.

3. Publishing Performance at Scale

The initial implementation published discounts one property at a time via the gateway's per-property API. With hundreds of thousands of individual API calls required per cycle, this was unacceptably slow — a single cycle could not complete before the next one started.

4. Concurrent Write Contention

The collector writes price data while the planner reads it. Traditional offset-based pagination broke under concurrent writes — rows were skipped or duplicated as new data shifted page boundaries.

5. Cross-Pod Log Visibility

In a Kubernetes environment with multiple pods, operations staff could not see real-time logs from the correct pod. They needed to know immediately if a collection or publishing job was failing.

6. Idempotency Under Failure

Network failures mid-batch could leave the system in an inconsistent state — some properties published with new prices, others stuck on stale prices. Retries risked duplicate API calls to the pricing gateway.

Solutions & Design Decisions

Demand-Driven Price Snapshots (Reverse-Driven Architecture)

Instead of building a proactive pipeline that pulls our own prices for every combination, I added a lightweight recording hook on the existing pricing gateway. Whenever an external price-comparison query reaches us (which happens naturally for every hotel a user views), the gateway logs the property, date, channel, and the price it returned into a cheapest_price snapshot table. The Publisher Job then reads from this table instead of actively querying our platform.

The insight: inbound query traffic is the demand signal. If no user ever searches a particular date, no snapshot gets recorded, and we never waste effort discounting it. The system automatically focuses pricing attention on the dates that actually matter.

Benefits:

Zero extra API load on our booking platform or wholesaler integrations
Natural demand filtering — we only act on dates with observed search interest
Freshness is automatic — snapshots refresh whenever inbound queries land, so stale prices self-correct
No polling schedules or cache invalidation logic to maintain

Batch API Integration (Batch Request Consolidation)

Replaced individual API calls with the pricing gateway's batch endpoint. Instead of hundreds of thousands of sequential HTTP calls, the system now groups properties into batch payloads (up to 500 items per call) — reducing API requests by roughly 500× and cutting the end-to-end publish cycle from hours to minutes.

Two-Pointer Pagination Algorithm

Designed a cursor-based pagination approach using two pointers (last-seen ID + created timestamp) instead of OFFSET/LIMIT. This guarantees stable iteration even when the collector is actively inserting rows into the same table. The algorithm never skips or double-processes records regardless of concurrent write volume.

SSE Log Streaming with Three-Layer Fallback

Built a real-time log pipeline: Logback Appender captures log events into local memory, which syncs to Redis Pub/Sub, which feeds SseEmitter endpoints. The three layers (local memory, Redis, SSE) provide graceful degradation — if Redis is down, local logs are still available; if a pod restarts, Redis retains recent history.

Redis + DB Dual-Layer Idempotency

Each batch operation generates a deterministic idempotency key. Redis provides fast deduplication for the common case (TTL-based), while the database serves as the durable fallback. This ensures exactly-once semantics for the gateway API calls even across pod restarts.

Chunk-Isolated Transactions

Each batch of 100 properties runs in its own transaction (REQUIRES_NEW propagation). If one chunk fails, it does not roll back the entire batch — only the affected 100 properties are retried. This dramatically reduces blast radius and improves recovery time.

Results & Impact

Performance

Automated daily price updates at 1.5M-row scale
Reduced API request volume by approximately 500× through batch update design
Planner Job processes 500K+ tasks per run in ~20 minutes via bulk INSERT…SELECT
Steady daily cadence: 8 scheduled runs (collector 2×, planner 2×, publisher 4×) with 100% success in the observed week

Operational Reliability

Improved operational reliability with retry-safe jobs, monitoring, and idempotent database updates
Zero data loss from mid-batch failures (dual-layer idempotency)
Chunk isolation limits blast radius to 100 properties per failure
SSE log streaming provides real-time visibility across all pods

Operational Impact

Reduced manual pricing operations and made price-rule updates reproducible
Pricing adjustments now reflect external market changes within hours, not days
Operations staff reassigned from manual pricing comparison to higher-value work

Learnings

Batch APIs Change Everything

The single biggest performance win was not clever code — it was switching from individual to batch API calls. Before optimizing algorithms, always check whether the external service offers a bulk endpoint. Consolidating individual calls into batch requests dwarfed every other optimization combined.

Design for Concurrent Writes from Day One

Offset-based pagination is a trap when tables are under active write load. The two-pointer approach costs almost nothing to implement upfront but saves enormous debugging pain later. I now default to cursor-based pagination for any table with concurrent writers.

Chunk Isolation is Worth the Complexity

REQUIRES_NEW transactions per chunk add complexity, but the operational benefit is massive. Being able to tell an operator "only 100 properties need re-processing instead of the full 500K" transforms incident response from panic to routine.

Inline Parsing Distributes Load Better Than Batch Parsing

Initially, the collector gathered all raw responses and parsed them in a separate batch step. Switching to inline parsing (parse immediately after each fetch) distributed the database write load more evenly across the collection window, eliminating the spike that caused lock contention.

Deep Dive: System Design Decisions

Key Insight: Use the Demand Signal You Already Have

The single most impactful architectural decision in this system wasn't about scale, batching, or concurrency — it was about where the "our price" baseline comes from.

A naive design would spin up a second pipeline that proactively queries our booking platform for every property × date combination. That pipeline would be expensive (wholesaler API hops), fragile (cache invalidation, schedule drift), and wasteful (most dates never get booked).

Instead, I realized inbound price-comparison queries already arrive at our gateway every time a user views a hotel — that traffic is a real-time demand signal. By adding a small recording hook on the pricing gateway, every inbound query becomes a snapshot in the cheapest_price table — no extra load, no cold dates, no polling.

This flips the architecture from pull-based (we actively query our own prices) to push-based (demand triggers recording). It is the reason the pricing pipeline can run on modest resources despite covering tens of thousands of properties.

Why a Monorepo?

Python excels at browser automation (Playwright) and rapid prototyping for collection logic. Java excels at reliable batch processing with strong typing and transaction management. React provides the monitoring interface. A monorepo keeps deployment coordination simple — one commit can update the collector parsing logic and the planner's task generation in lockstep.

Running 7.7 GB of Live Data on a Micro Instance

The entire production database runs on a Cloud SQL Micro tier — 640 MB InnoDB buffer pool, the cheapest managed tier available. Despite the modest hardware, the system processes 1.5M UPSERTs per day and handles peak runs of 500K+ rows.

Every design decision in this project exists to make that possible:

Chunked transactions (100 properties per chunk) keep working sets well under buffer pool size
Two-pointer pagination streams rows instead of loading the full result set
Bulk INSERT…SELECT avoids shuttling millions of rows through the application layer
48-hour TTL on fetch_raw prevents debug data from ballooning the DB
Demand-driven snapshots remove an entire subsystem that would otherwise exist
Batch endpoint publishing means ~500× fewer TCP connections per run

Running this workload on a larger instance would be the easy fix. Running it on Micro is what forced the architecture to stay lean.

Scale in Production

Total dataset: 7.7 GB — millions of price snapshots and pricing tasks
Daily write volume: ~1.5M UPSERTs into hotel_prices, ~1M into mc_planner_tasks
Peak single run: 500,000+ tasks processed by a single planner invocation
Daily cadence: 2 collection runs/day (~50 min per run), 2 planner runs/day (~20 min per run), 4 publish runs/day
Infrastructure: Cloud SQL MySQL 8.4 · Micro tier · 640 MB InnoDB buffer pool
Publish efficiency: ~500× fewer API requests via batch endpoint integration (up to 500 items per call)

Two-Pointer Pagination

The core challenge: the collector is continuously inserting rows while the planner reads from the same table. Traditional OFFSET/LIMIT shifts row positions on every insert, causing the planner to skip or double-count records.

Cursor-Based Pagination (Two-Pointer Approach)

-- Instead of OFFSET/LIMIT which breaks under concurrent writes:
-- SELECT * FROM prices ORDER BY id LIMIT 1000 OFFSET 5000  -- UNRELIABLE

-- Two-pointer approach: stable regardless of concurrent inserts
SELECT * FROM prices
WHERE (created_at > :lastTimestamp)
   OR (created_at = :lastTimestamp AND id > :lastId)
ORDER BY created_at ASC, id ASC
LIMIT 1000

Why Two Pointers Instead of One?

Using only id > :lastId works for auto-increment but fails if IDs have gaps or are not sequential. Using only created_at > :lastTimestamp fails when multiple rows share the same timestamp. The combination of both fields guarantees a unique, stable sort order — the timestamp provides coarse ordering, and the ID breaks ties.

Chunk-Isolated Transaction Pattern

REQUIRES_NEW Per Chunk (Spring)

// Outer method: iterates over chunks, continues on failure
@Transactional
public void publishAll(List<Property> properties) {
    Lists.partition(properties, CHUNK_SIZE).forEach(chunk -> {
        try {
            chunkPublisher.publishChunk(chunk); // REQUIRES_NEW
        } catch (Exception e) {
            log.error("Chunk failed, {} properties affected", chunk.size(), e);
            // Other chunks continue — blast radius is limited
        }
    });
}

// Inner method: independent transaction per chunk
@Transactional(propagation = Propagation.REQUIRES_NEW)
public void publishChunk(List<Property> chunk) {
    // If this fails, only this chunk rolls back
    batchApi.publish(chunk);
    markAsPublished(chunk);
}

SSE Log Streaming Architecture

Real-time log visibility across Kubernetes pods is essential for operations. The three-layer design ensures logs are always accessible, even during partial infrastructure failures.

K8s Pod Lifecycle Awareness

SSE connections are long-lived, but pods are ephemeral. When a pod scales down or restarts, all SSE connections to that pod drop. The Redis layer acts as the bridge — any pod can serve historical logs from Redis, and new connections automatically pick up the stream regardless of which pod generated the original log entries.

Scheduled Collection With Operational Safeguards

The collector runs on a fixed schedule with explicit operational safeguards: per-target rate limits, retry budgets, randomized inter-request timing to spread load across the collection window, and circuit-breaker logic that backs off when upstream errors spike. The goal is well-behaved, observable load — not maximum throughput. Configuration is centralized so that operations staff can tune cadence and concurrency without touching code.

Dual-Layer Idempotency

Redis + DB Idempotency Pattern

public boolean tryAcquire(String idempotencyKey) {
    // Layer 1: Redis (fast path, TTL-based expiry)
    Boolean acquired = redis.opsForValue()
        .setIfAbsent(key, "1", Duration.ofHours(24));
    if (Boolean.TRUE.equals(acquired)) {
        return true;  // First attempt — proceed
    }

    // Layer 2: DB — authoritative check if Redis reports the key already exists
    return idempotencyRepository.insertIgnore(key);
    // INSERT IGNORE ensures exactly-once even across pod restarts or Redis evictions
}

Why Two Layers?

Redis alone is not durable — keys can be lost during failover or memory pressure. The database alone is too slow for the hot path (hundreds of thousands of checks per cycle). The dual-layer approach gives us the speed of Redis for the 99.9% case and the durability of MySQL for the edge cases that matter most.

Note: This case study describes the engineering approach and public-safe architectural decisions. Internal identifiers, business rules, proprietary implementation details, and sensitive operational data have been omitted or generalized.

This case study describes the engineering approach and public-safe architectural decisions. Internal identifiers, business rules, proprietary implementation details, and sensitive operational data have been omitted or generalized.