DataAnalyticsDashboard

NGINX One Observability

Reimagining how engineers monitor, compare, and diagnose NGINX instances across fleets.

Role

Lead Product Designer

Team

2 PMs, 10+ Engineers

Timeline

4 Months

Focus

Data Viz, UX Strategy

NGINX One Final Design
The Business Driver

The 3rd-Party Observability Platform Gap.

Customers love NGINX for its performance, but rely on Datadog and Grafana for visibility. This fragmentation creates a strategic gap: we own the traffic, but not the insight.

Stakeholder Insights

"This complexity isn't just a UX problem—it's a value leak. Customers have been asking for this for years. By closing this visibility gap, we reclaim the debugging workflow and turn observability into a sticky feature that prevents churn."

Users

Empowering the Experts

Platform Ops

"I need to see overall system health at a glance to spot issues early without diving into raw metrics."

Application Engineer

"When performance degrades, I need to pinpoint which config change caused it to fix it fast."

SecOps

"I need to validate that security policies are active and blocking threats effectively."

The User Journey

We mapped the critical path for all three personas to a unified "Observe-Diagnose-Resolve" loop.

1

Observe

Identify abnormal patterns or potential risks through clear visual signals (e.g., spikes, error trends, CVE alerts).

2

Diagnose

Drill into specific instances or metrics to uncover root causes, with contextual views and linked data.

3

Resolve

Provide direct links to relevant solutions or knowledge base articles, enabling faster, confident action.

The Challenge

Data Without Diagnosis

The legacy dashboard was packed with data but lacked hierarchy. It could flag that an error occurred, but never why. Engineers saw spikes in 500 errors but had to leave NGINX and lean on 3rd-party plugins to find the root cause. This forced constant context-switching to answer simple questions:

  • "Is my instance healthy or failing?"
  • "Where is the traffic spike coming from?"
  • "Which config change caused this error?"

Information Overload

Scanning rows of raw numbers induces high cognitive load during incidents.

Context Switching

Users must mentally stitch together isolated data points to find root causes.

Legacy NGINX Dashboard

Legacy Dashboard

Legacy NGINX Metrics

Fragmented Metrics

The Goal

How do we make hidden data visible and actionable?

When a server is on fire, engineers don't have time to analyze 50 charts. Our goal was to give them the answer—not just the data—in under 5 seconds.

We needed to move from "showing everything" to "showing what matters." This meant designing a system that cuts through the noise and instantly points SREs to the root cause of an outage.

User Insight

"When a critical error happens, help me find why it happened, and where to look to investigate."

Instant Situational Awareness

We prioritized anomaly detection over raw metrics. By visually exaggerating spikes and errors, we ensure that critical issues jump out at the user, reducing the cognitive load during high-pressure incidents.

Actionable Insights

Every visualization is a pathway to a solution. We designed the "Data Explorer" to not only display trends but to allow engineers to drill down into specific requests and logs without losing context.

Instant Situational Awareness

Dashboard

The first layer of defense is the Instance Dashboard. We replaced standard charts with high-density sparklines that prioritize trend direction over raw values.

NGINX One Dashboard Widgets

Grouping by Intent

Metrics are categorized by intent: Utilization (Health), Status (Security), and Traffic (Throughput).

Sparklines over Charts

Sparklines show trends rather than precise values, saving 60% of vertical space while highlighting anomalies.

Progressive Disclosure

Secondary details are hidden until hover, keeping the initial scan clean and focused on critical signals.

Actionable Insights

Data Explorer

When a spike is detected, the Data Explorer takes over. Unlike static reports, this interactive tool allows SREs to drill down from a global anomaly to a specific request in three clicks.

The interaction model is tuned for speed, visually exaggerating outliers so root causes can't hide in the noise.

NGINX One Data Explorer Animation
Process

Iterative Co-creation Workflow

This project began as a cross-functional design exploration that brought together PM insights, engineering prototypes, and design experimentation. Instead of a fixed scope, we adopted an iterative co-create workflow to define priorities, validate feasibility, and refine visual patterns in real time.

🎨

Design

Synthesized cross-team ideas, ran iterative design sessions, and simplified complex data into a clear, actionable experience.

⚙️

Engineering

Evolved a hackathon prototype into a working system, testing data refresh rates and technical limits in real time.

💬

Product

Synthesized customer pain points to define which metrics mattered most and what "glanceable" really means.

Exploration

Mapping the Data Landscape

Before defining the 4-layer model, I conducted extensive mapping exercises to understand the relationships between NGINX's vast metrics ecosystem. These sketches helped identify the natural clusters that eventually formed our data strategy.

Data Flow Mapping

Fig 3. Data Flow & Layer Mapping

NGINX Plus Dashboard Data Taxonomy

Fig 4. Initial Taxonomy of NGINX Plus Dashboard Data

Instance Metrics Tree

Fig 5. Instance Metrics Hierarchy Tree

System Architecture

The Observability Blueprint

Before designing screens, I co-created this map with engineering to define the observability ecosystem. We needed to ensure every metric had a clear lineage and every user action had a feasible destination.

This Concept Map (Fig 7) became our shared source of truth, aligning design intent with technical reality.

Fig 7. System Concept Map — Co-created with Engineering

Strategy

Potential Directions

Navigating conflicting stakeholder priorities was key. Engineering pushed for a comprehensive technical view, while Product Management wanted a fast, safe MVP. My role was to synthesize these into a scalable design solution.

Engineering's View

Option 1: Infinite Map

A comprehensive technical vision to visualize every connection. While powerful, it risked overwhelming users and faced severe performance hurdles.

All-in-one solution

Scalability & Performance risks

PM's View

Option 2: Simple Charts

The "safe" MVP route. Fast to build and performant, but offered little competitive value and failed to solve the core diagnostic problem.

High Performance

Low value add vs. competitors

SELECTED
My Synthesis

Option 3: Data Explorer

I aligned the team on a balanced approach: a flexible explorer that leverages the Design System to handle complexity without sacrificing performance.

Fits all user needs & Flexible

Scalable & Performant

Engineering Hackathon: Infinite Map Concept 1

Fig 8. Engineering Hackathon: Infinite Map Concept

Engineering Hackathon: Infinite Map Concept 2

Fig 9. Engineering Hackathon: Traffic Flow Visualization

PM Concept: Simple Charts 1

Fig 10. PM Concept: Simple Charts & Sankey

PM Concept: Simple Charts 2

Fig 11. PM Concept: Basic Status Check

Option 3: Existing Visualization Library Exploration 1

Fig 12. Option 3: Exploring Existing Visualization Library (Charts)

Option 3: Existing Visualization Library Exploration 2

Fig 13. Option 3: Exploring Existing Visualization Library (Sankey & Metrics)

Detailed Design

NGINX Open Source & NGINX Plus Metrics

A clean, easy-to-scan combined view of traffic, health, CVEs, and resource usage. Clear hierarchy, color, and compact charts make key information instantly visible.

Instance Dashboard Overview

Overview

Instance status, utilization trend, and network traffic trend sparklines.

View Switcher

Seamlessly toggle between Overview, Metrics, and Configuration views.

Time Range

Global time controls to correlate metrics across different periods.

Detailed Metrics Views

Beyond the high-level summary, I designed specialized views for each metric category. These detailed screens allow engineers to drill down into specific data points—Traffic, Utilization, Connections, and Requests—without losing context.

Traffic Analysis

Detailed breakdown of throughput trends, bandwidth usage, and latency metrics across instances.

Bytes In/Out

Bytes In/Out Detail

Latency Metrics

Latency Analysis

Launch & Future

Launch & Future Improvements

The NGINX One Console launched in public preview in 2025. Early feedback has been positive, with users highlighting the "Data Explorer" as a significant improvement for rapid diagnosis, transforming what used to be a multi-tool hunt into a streamlined workflow.

Roadmap Priorities

Intelligent Config Tuning

Analyze traffic patterns to suggest specific configuration changes that improve performance and security.

Customizable Dashboards

Allow users to build their own views based on specific team needs, moving beyond the "one size fits all" default.

Ecosystem Integrations

Seamless workflows with PagerDuty, Slack, and Jira to streamline incident response and team collaboration.

Takeaways

Lessons Learned

01

Clarity Over Complexity

When visualizing large-scale data, clarity is more valuable than visual novelty. Clear hierarchy, consistent scales, and recognizable patterns help users act faster and trust the system.

02

Design Systems Are Evolving Tools

A design system isn’t a rulebook—it’s a living framework. Extending it for new visualization needs keeps consistency without stifling innovation.

03

Data Has a Story to Tell

UX design plays a key role in uncovering meaning from massive, complex data. Helping users see trends and connections turns raw telemetry into insight.

© 2026 Xiaowei Chen