Overview
A proof of concept commissioned by a major Taiwanese insurance client for their mobile remote-investment service, delivered as a Cisco + Dynasafe + partner engagement in October 2021. The business problem: when users reported slowness on the service, the network, application, and database teams each had their own tools — and each could demonstrate their own layer was healthy. The user-facing symptom stayed unexplained while the teams took turns pointing at each other.
The hypothesis: an end-to-end triage dashboard that surfaces network and application telemetry side by side can collapse the first 30 minutes of an incident from a cross-team argument into a clear "open AppDynamics" or "open ThousandEyes" direction.
My scope: I designed and built the backend pieces — a Spring Boot aggregation core (ServiceCore) and an Express proxy that fronts it — and co-developed the React dashboard frontend with a fellow engineer.
What it integrated with: the client's live AppDynamics and ThousandEyes installations — 4 ThousandEyes network paths and 7 AppDynamics tiles (2 infrastructure tiles for AP Server and Database + 5 business-application tiles covering key business flows).
My Role
I worked across three codebases in this PoC:
Aggregation core — ServiceCore (Spring Boot): designed the backend service structure, normalized data from AppDynamics and ThousandEyes into a shared telemetry model, and exposed a single aggregated API for the dashboard.
Edge proxy (Node.js + Express): built the thin Express layer that sits in front of ServiceCore — HTTPS termination, YAML-driven proxy target so ops can move ServiceCore without a rebuild.
React dashboard (co-developed): co-built the two-group component layout (AppData/* for AppDynamics tiles, ThousandEyes/* for network-path tiles), the polling and context state, and the drill-down pages. I focused on correctness fixes, shared state / polling behavior, and UI refinements, while my co-author led the initial component structure and Layer 2 drill-down pages.
Tech Stack
Aggregation Core
Edge Proxy
Frontend
Data Sources
Architecture
The PoC was implemented as three codebases with a deliberately simple architecture to fit the timeline:
Frontend SPA (React 17). Two component groups — AppData/* and ThousandEyes/*. The frontend polls a single aggregated telemetry endpoint every 5 seconds and renders both network and application health from the same response. First page is the traffic-light board; clicking a tile routes to its Layer 2 drill-down.
Edge proxy (Express). HTTPS termination, static hosting for the SPA build, and a reverse proxy that forwards the dashboard's single telemetry endpoint to ServiceCore. The upstream URL lives in config.yml so relocating ServiceCore is an ops change, not a rebuild.
Aggregation core (Spring Boot). Pulls from AppDynamics REST API (per application) and ThousandEyes test API (per path), normalizes the two very different shapes into a common telemetry model, and serves the combined envelope back.
The triage pattern the whole stack delivers:
- Layer 1 — an end-to-end request-path flow diagram (Client → Internet → Firewall → AP Server → Database) with colored status on each node and each of the 4 network segments, plus a row of 5 business-application tiles below. First glance answers "which hop or which business app is unhealthy?"
- Layer 2 — top Health Rule violations under the red node / segment / tile. Operator gets a direction: Lock, CPU, GC, OOM, retransmission, latency spike.
- Layer 3 — click-through to AppDynamics or ThousandEyes for the actual root-cause investigation. The dashboard directs investigation rather than replacing the specialist tools.
Key Challenges
1. Two telemetry sources with very different vocabularies
AppDynamics talks applications — transactions, nodes, tiers, JVM metrics. ThousandEyes talks network — agents, tests, paths, hops. Putting them behind one API without making the frontend render two different trees needed a common shape at the persistence layer.
2. Client pain points, not generic monitoring
The client came with concrete narratives: DB lock source takes hours to trace, JVM OOM arrives without warning, packet retransmission is high but the cause is unclear, request stage timings are missing. The PoC had to answer these, not produce a generic Grafana.
3. Three codebases shipped under a PoC deadline
Frontend, proxy, and backend were separate repos with two authors on the frontend. Keeping them independent meant the API contract between tiers had to be pinned down early so neither side blocked the other.
4. Demo-ready reliability in a short window
A live partner demo with real AppDynamics + ThousandEyes data on a 5-second refresh — "it would work if everything connected" wasn't enough.
Solutions & Design Decisions
One aggregated-telemetry envelope
Pinning the envelope shape at the start of the project let the frontend and backend iterate independently. The response has two top-level keys — one per upstream source — because AppDynamics and ThousandEyes have different native shapes; but both arrays follow the same per-item structure (id, status, headline metric, breakdown) so the UI renders them with shared components.
Pain-point-driven demo narrative
The demo was structured around the client's own operational pain points, with each scenario showing how the dashboard narrowed the problem to the right domain before linking into the relevant specialist tool.
The dashboard narrows; the specialist tool diagnoses
Layer 1 narrows the technical domain. Layer 2 surfaces the top Health Rule violations. Layer 3 is a click-through into AppDynamics or ThousandEyes. Keeping the dashboard's scope small — direct attention instead of duplicating what the specialist tools already do — was what made shipping on the PoC timeline possible.
YAML-driven proxy target
The Express proxy reads a tiny config.yml (upstream URL, port) at boot. When ServiceCore moved hosts mid-engagement, the change was a config edit + systemctl restart, not a rebuild. Small, but in a PoC environment where IPs change without notice it matters.
PoC trade-offs
Several choices in this system are PoC-grade, not production-grade, and the team knew that:
- TLS was terminated at the proxy with a self-signed certificate for demo speed; a production rollout would use the client's existing ingress or F5.
- Polling was chosen over SSE / WebSockets to keep the PoC simple and demo-friendly.
- No caching layer was introduced between ServiceCore and the upstream APIs because the PoC traffic volume did not require it.
Results & Impact
Delivered
- Three-codebase stack operational end-to-end: React → Express proxy → ServiceCore → AppDynamics / ThousandEyes
- Live integration with 4 ThousandEyes network paths + 7 AppDynamics tiles (2 infrastructure + 5 business-application tiles covering key business flows)
- 5-second live refresh during demo
What the demo showed
- A single screen that narrows an unhealthy state to the right technical domain in seconds
- Click-through to the specialist tool for root-cause investigation
- An integration pattern that complements rather than replaces existing tooling
Learnings
Pin the contract, iterate the rest
Freezing the telemetry contract early let frontend and backend move independently, which reduced coordination overhead and kept the PoC moving under a tight timeline.
A triage dashboard's job is to point, not to replace
The biggest scope trap was "let's put all the AppDynamics detail in the dashboard." The much smaller, correct scope was Layer 1 (narrow) + Layer 2 (top violations) + a deep link out. The specialist tools already do root-cause analysis better than we would.
Separate the proxy from the data service
Keeping the Express layer thin — HTTPS, static hosting, config-driven upstream — let ServiceCore stay a pure data service with no knowledge of TLS or deployment shape. The split was a few hours of work and saved significant rework when the demo host changed.
Clear ownership matters in collaborative frontend work
With two authors on the React repo, the cleanest split was component-level: one person takes a component end-to-end, the other reviews. Mixing the same file in parallel produced merge noise not worth the theoretical parallelism.
Polling was the right complexity for a PoC
A 5-second setInterval over the telemetry endpoint covers the "the dashboard updates" UX without any WebSocket plumbing or proxy-side pub/sub. For a PoC, additional complexity would have been cost without a benefit on this timeline.
Deep Dive
The single API contract
One fetch, one envelope. Both arrays share a per-item shape so the UI renders them with common tile components.
type TelemetrySnapshot = { te: ThousandEyesPoint[]; // one entry per ThousandEyes test ad: AppDynamicsPoint[]; // one entry per AppDynamics application };
const fetchApi = () => { fetch("/api/telemetry") .then((r) => r.json()) .then(({ te, ad }) => { setThousandeyeData(te); setAppDynamic(ad); }) .catch((err) => setApiError(err.message)); }; useEffect(() => { fetchApi(); const id = setInterval(fetchApi, intervalTime); // 5s return () => clearInterval(id); }, []);
Edge proxy — the thin part
The Express layer is deliberately minimal: HTTPS termination, the React build served as static assets, and a single reverse-proxy hop to ServiceCore. The upstream target lives in config.yml, so moving ServiceCore mid-engagement was a config edit + systemctl restart, not a rebuild.
Pain-point → demo scenario
The demo script was organized around the client's own pain narratives rather than a feature tour:
| Client pain point | Dashboard scenario |
|---|---|
| DB lock source is hard to trace | DB tile red · Health Rule: Lock · drill into AppD for the slow SQL |
| JVM OOM arrives without warning | Server tile red · Health Rule: GC / CPU · drill into AppD JVM memory |
| Packet retransmission high, cause unclear | Network tile red · drill into ThousandEyes path and device telemetry |
| Request T1–Tn stage timing is missing | Layer 2 per-stage timing · drill into AppD transaction snapshot |
| Slow transactions have no "whose fault" answer | End-to-end correlation across firewall · web→AP · AP→DB · internet paths |
Aggregation core — ServiceCore
A Spring Boot service backed by PostgreSQL that aggregates AppDynamics and ThousandEyes data into a unified telemetry model for the dashboard API.
Live Demo
An interactive mock of the dashboard. Same Layer 1 + Layer 2 structure as the original POC — same 4 ThousandEyes paths, same 7 AppDynamics tiles, same 5-second polling cadence — driven by an in-browser telemetry engine. All values are fictional.
Try it
- Follow the red light. The flow diagram narrows the problem to a specific hop (Internet / Firewall / Web→AP / AP→DB / AP Server / Database) and the business-app row surfaces any red tile. Click any node, segment, or tile for the Layer 2 drill-down.
- A few signals start in a degraded state by design, so the dashboard always presents a realistic triage path to explore.