DkubeX

Overview

DkubeX is an MLOps platform that gives data science teams a unified interface for managing the full ML lifecycle — from dataset versioning and experiment tracking to model training on Kubernetes clusters and deployment pipelines.

I joined as the frontend architect at a point where the product had grown fast and accumulated significant UI debt: inconsistent components, no shared design system, and no test coverage on the component layer. My mandate was to build the foundation that would let the team move faster without breaking things.

My Role

I owned the frontend architecture end-to-end: design system, component library, pipeline visualization, and the testing framework.

Designed and built a 60+ component Storybook library from scratch — covered inputs, data tables, status indicators, modals, and the complex pipeline visualization components
Built the pipeline DAG visualization — a custom directed acyclic graph renderer using SVG that could handle 50+ node pipelines with real-time status polling
Established the Redux store architecture for ML job state — defined slice boundaries, normalization patterns, and the polling middleware that kept job status fresh without hammering the API
Wrote Jest + RTL test suites for all shared components; set up MSW (Mock Service Worker) for API mocking in tests
Onboarded two junior engineers onto the component library patterns

Architecture

React SPA
  ├── Redux store (normalized job/pipeline state)
  ├── Pipeline DAG renderer (custom SVG)
  └── Storybook component library
       │
       ▼
REST API (Node.js)
       │
       ▼
Kubernetes job scheduler
  ├── Training jobs (GPU nodes)
  ├── Data preprocessing pipelines
  └── Model serving (inference endpoints)

The frontend communicated with a REST API that fronted the Kubernetes control plane. The most complex part was the pipeline visualization: nodes represented K8s jobs, edges represented data dependencies, and the whole graph needed to update live as jobs transitioned through Pending → Running → Succeeded/Failed.

Key Challenges & Decisions

DAG visualization without a graph library. We evaluated D3 and React Flow, but both imposed constraints on node layout that didn't fit our data shape (pipelines could be wide, with many parallel branches). I built a custom SVG renderer with a layered layout algorithm: assign each node a column based on its topological depth, distribute nodes vertically within each column, then draw Bezier edges between them. The result was a layout that felt natural for ML pipelines and required no third-party graph dependency.

Real-time status with bounded polling. ML jobs can run for hours. We couldn't SSE-stream the entire job lifecycle, but we also didn't want to hammer the API at 1s intervals. The solution: a polling middleware that starts at 2s intervals when a job is Running, backs off exponentially after 5 minutes of no state change (max 30s), and resets when the user focuses the tab. This kept the UI responsive during active jobs and quiet during long-running ones.

Storybook as the source of truth. The previous workflow had designers sharing Figma frames that diverged from the actual UI. We flipped this: Storybook became the source of truth. Every component had stories for all meaningful states, and the Storybook URL was what got shared in design reviews. This closed the Figma-to-code gap and gave QA a stable visual regression surface.

Impact & Outcomes

Shared component library (Storybook) meant new features could reuse existing components rather than rebuilding from scratch
DAG visualization worked well for complex pipelines; SVG-based with smooth zoom and pan
Jest + RTL test coverage on the component library meant regressions surfaced before staging

Visualization

DkubeX — ML Pipeline Architecture