End-to-End Monitoring Dashboard (Cisco POC)

Overview

A proof of concept commissioned by a major Taiwanese insurance client for their mobile remote-investment service, delivered as a Cisco + Dynasafe + partner engagement in October 2021. The business problem: when users reported slowness on the service, the network, application, and database teams each had their own tools — and each could demonstrate their own layer was healthy. The user-facing symptom stayed unexplained while the teams took turns pointing at each other.

The hypothesis: an end-to-end triage dashboard that surfaces network and application telemetry side by side can collapse the first 30 minutes of an incident from a cross-team argument into a clear "open AppDynamics" or "open ThousandEyes" direction.

My scope: I designed and built the backend pieces — a Spring Boot aggregation core (ServiceCore) and an Express proxy that fronts it — and co-developed the React dashboard frontend with a fellow engineer.

What it integrated with: the client's live AppDynamics and ThousandEyes installations — 4 ThousandEyes network paths and 7 AppDynamics tiles (2 infrastructure tiles for AP Server and Database + 5 business-application tiles covering key business flows).

Live Demo

An interactive mock of the dashboard. Same Layer 1 + Layer 2 structure as the original POC — same 4 ThousandEyes paths, same 7 AppDynamics tiles, same 5-second polling cadence — driven by an in-browser telemetry engine. All values are fictional.

E2E Monitoring DashboardMock Data

Open in new tab

Click any tile for the Layer 2 drill-down with 24-hour timeline and Health Rule violations. A per-tile scenario machine seeds the board so some tiles are already red on load.

Try it

Follow the red light. The flow diagram narrows the problem to a specific hop (Internet / Firewall / Web→AP / AP→DB / AP Server / Database) and the business-app row surfaces any red tile. Click any node, segment, or tile for the Layer 2 drill-down.
A few signals start in a degraded state by design, so the dashboard always presents a realistic triage path to explore.

My Role

I worked across three codebases in this PoC:

Aggregation core — ServiceCore (Spring Boot): designed the backend service structure, normalized data from AppDynamics and ThousandEyes into a shared telemetry model, and exposed a single aggregated API for the dashboard.

Edge proxy (Node.js + Express): built the thin Express layer that sits in front of ServiceCore — HTTPS termination, YAML-driven proxy target so ops can move ServiceCore without a rebuild.

React dashboard (co-developed): co-built the two-group component layout (AppData/* for AppDynamics tiles, ThousandEyes/* for network-path tiles), the polling and context state, and the drill-down pages. I focused on correctness fixes, shared state / polling behavior, and UI refinements, while my co-author led the initial component structure and Layer 2 drill-down pages.

Tech Stack

Aggregation Core

Spring Boot · Spring Data JPA · HibernatePostgreSQL

Edge Proxy

Node.js + Express (HTTPS termination)http-proxy-middlewareYAML-driven configuration

Frontend

React 17 (CRA)styled-components + SCSSreact-router-dom v5

Data Sources

Cisco AppDynamics (APM)Cisco ThousandEyes (network path tests)

Architecture

The PoC was implemented as three codebases with a deliberately simple architecture to fit the timeline:

Frontend SPA (React 17). Two component groups — AppData/* and ThousandEyes/*. The frontend polls a single aggregated telemetry endpoint every 5 seconds and renders both network and application health from the same response. First page is the traffic-light board; clicking a tile routes to its Layer 2 drill-down.

Edge proxy (Express). HTTPS termination, static hosting for the SPA build, and a reverse proxy that forwards the dashboard's single telemetry endpoint to ServiceCore. The upstream URL lives in config.yml so relocating ServiceCore is an ops change, not a rebuild.

Aggregation core (Spring Boot). Pulls from AppDynamics REST API (per application) and ThousandEyes test API (per path), normalizes the two very different shapes into a common telemetry model, and serves the combined envelope back.

The triage pattern the whole stack delivers:

Layer 1 — an end-to-end request-path flow diagram (Client → Internet → Firewall → AP Server → Database) with colored status on each node and each of the 4 network segments, plus a row of 5 business-application tiles below. First glance answers "which hop or which business app is unhealthy?"
Layer 2 — top Health Rule violations under the red node / segment / tile. Operator gets a direction: Lock, CPU, GC, OOM, retransmission, latency spike.
Layer 3 — click-through to AppDynamics or ThousandEyes for the actual root-cause investigation. The dashboard directs investigation rather than replacing the specialist tools.

Key Challenges

1. Two telemetry sources with very different vocabularies

AppDynamics talks applications — transactions, nodes, tiers, JVM metrics. ThousandEyes talks network — agents, tests, paths, hops. Putting them behind one API without making the frontend render two different trees needed a common shape at the persistence layer.

2. Client pain points, not generic monitoring

The client came with concrete narratives: DB lock source takes hours to trace, JVM OOM arrives without warning, packet retransmission is high but the cause is unclear, request stage timings are missing. The PoC had to answer these, not produce a generic Grafana.

3. Three codebases shipped under a PoC deadline

Frontend, proxy, and backend were separate repos with two authors on the frontend. Keeping them independent meant the API contract between tiers had to be pinned down early so neither side blocked the other.

4. Demo-ready reliability in a short window

A live partner demo with real AppDynamics + ThousandEyes data on a 5-second refresh — "it would work if everything connected" wasn't enough.

Solutions & Design Decisions

One aggregated-telemetry envelope

Pinning the envelope shape at the start of the project let the frontend and backend iterate independently. The response has two top-level keys — one per upstream source — because AppDynamics and ThousandEyes have different native shapes; but both arrays follow the same per-item structure (id, status, headline metric, breakdown) so the UI renders them with shared components.

Pain-point-driven demo narrative

The demo was structured around the client's own operational pain points, with each scenario showing how the dashboard narrowed the problem to the right domain before linking into the relevant specialist tool.

The dashboard narrows; the specialist tool diagnoses

Layer 1 narrows the technical domain. Layer 2 surfaces the top Health Rule violations. Layer 3 is a click-through into AppDynamics or ThousandEyes. Keeping the dashboard's scope small — direct attention instead of duplicating what the specialist tools already do — was what made shipping on the PoC timeline possible.

YAML-driven proxy target

The Express proxy reads a tiny config.yml (upstream URL, port) at boot. When ServiceCore moved hosts mid-engagement, the change was a config edit + systemctl restart, not a rebuild. Small, but in a PoC environment where IPs change without notice it matters.

PoC trade-offs

Several choices in this system are PoC-grade, not production-grade, and the team knew that:

TLS was terminated at the proxy with a self-signed certificate for demo speed; a production rollout would use the client's existing ingress or F5.
Polling was chosen over SSE / WebSockets to keep the PoC simple and demo-friendly.
No caching layer was introduced between ServiceCore and the upstream APIs because the PoC traffic volume did not require it.

Results & Impact

Delivered

Three-codebase stack operational end-to-end: React → Express proxy → ServiceCore → AppDynamics / ThousandEyes
Live integration with 4 ThousandEyes network paths + 7 AppDynamics tiles (2 infrastructure + 5 business-application tiles covering key business flows)
5-second live refresh during demo

What the demo showed

A single screen that narrows an unhealthy state to the right technical domain in seconds
Click-through to the specialist tool for root-cause investigation
An integration pattern that complements rather than replaces existing tooling

Learnings

Pin the contract, iterate the rest

Freezing the telemetry contract early let frontend and backend move independently, which reduced coordination overhead and kept the PoC moving under a tight timeline.

A triage dashboard's job is to point, not to replace

The biggest scope trap was "let's put all the AppDynamics detail in the dashboard." The much smaller, correct scope was Layer 1 (narrow) + Layer 2 (top violations) + a deep link out. The specialist tools already do root-cause analysis better than we would.

Separate the proxy from the data service

Keeping the Express layer thin — HTTPS, static hosting, config-driven upstream — let ServiceCore stay a pure data service with no knowledge of TLS or deployment shape. The split was a few hours of work and saved significant rework when the demo host changed.

Clear ownership matters in collaborative frontend work

With two authors on the React repo, the cleanest split was component-level: one person takes a component end-to-end, the other reviews. Mixing the same file in parallel produced merge noise not worth the theoretical parallelism.

Polling was the right complexity for a PoC

A 5-second setInterval over the telemetry endpoint covers the "the dashboard updates" UX without any WebSocket plumbing or proxy-side pub/sub. For a PoC, additional complexity would have been cost without a benefit on this timeline.

Deep Dive

The single API contract

One fetch, one envelope. Both arrays share a per-item shape so the UI renders them with common tile components.

Envelope shape consumed by the frontend

type TelemetrySnapshot = {
  te: ThousandEyesPoint[];   // one entry per ThousandEyes test
  ad: AppDynamicsPoint[];    // one entry per AppDynamics application
};

Frontend polling — the entire data-fetch surface

const fetchApi = () => {
  fetch("/api/telemetry")
    .then((r) => r.json())
    .then(({ te, ad }) => {
      setThousandeyeData(te);
      setAppDynamic(ad);
    })
    .catch((err) => setApiError(err.message));
};

useEffect(() => {
  fetchApi();
  const id = setInterval(fetchApi, intervalTime); // 5s
  return () => clearInterval(id);
}, []);

Edge proxy — the thin part

The Express layer is deliberately minimal: HTTPS termination, the React build served as static assets, and a single reverse-proxy hop to ServiceCore. The upstream target lives in config.yml, so moving ServiceCore mid-engagement was a config edit + systemctl restart, not a rebuild.

Pain-point → demo scenario

The demo script was organized around the client's own pain narratives rather than a feature tour:

Client pain point	Dashboard scenario
DB lock source is hard to trace	DB tile red · Health Rule: Lock · drill into AppD for the slow SQL
JVM OOM arrives without warning	Server tile red · Health Rule: GC / CPU · drill into AppD JVM memory
Packet retransmission high, cause unclear	Network tile red · drill into ThousandEyes path and device telemetry
Request T1–Tn stage timing is missing	Layer 2 per-stage timing · drill into AppD transaction snapshot
Slow transactions have no "whose fault" answer	End-to-end correlation across firewall · web→AP · AP→DB · internet paths

Aggregation core — ServiceCore

A Spring Boot service backed by PostgreSQL that aggregates AppDynamics and ThousandEyes data into a unified telemetry model for the dashboard API.

This case study describes the engineering approach and public-safe architectural decisions. Internal identifiers, business rules, proprietary implementation details, and sensitive operational data have been omitted or generalized.