Overview
DocMind is an AI-native document intelligence platform that takes raw documents — PDFs, scanned forms, invoices — and produces structured, queryable output through an automated pipeline: OCR extraction, intelligent chunking, LLM enrichment, and confidence-scored field mapping.
My mandate covered the entire frontend and the BFF layer that stitched the pipeline services together. This was the team's first AI product, which meant there was no playbook — just the pipeline, the data shapes, and a hard deadline.
My Role
I led the frontend architecture and owned the Next.js BFF (API routes) that served as the integration boundary between the React UI and the upstream AI services.
Key responsibilities:
- Designed the real-time pipeline status UI — users could watch their document move through each stage (upload → OCR → chunking → enrichment → review)
- Built the confidence scoring UX — a custom field-level indicator that let reviewers immediately spot which extracted values needed human verification
- Architected the BFF layer, normalizing responses from Textract, LiteLLM, and Langfuse into a single typed schema the UI consumed
- Integrated Langfuse for LLM observability — every extraction request was traced, which dramatically shortened debugging cycles during the first weeks of production
Architecture
User uploads document
│
▼
S3 (raw storage)
│
▼
AWS Textract (OCR + layout analysis)
│
▼
Chunking service (split by semantic block)
│
▼
LiteLLM gateway → LLM enrichment (field extraction, classification)
│
▼
Redis pub/sub (pipeline status events)
│
▼
Next.js BFF (SSE stream to client)
│
▼
React UI (live status + review interface)
The BFF subscribed to Redis channels per document ID and forwarded pipeline events to the browser as a Server-Sent Events stream. No WebSocket — SSE was simpler to deploy and sufficient for unidirectional status updates.
Key Challenges & Decisions
Confidence scoring UX. Textract returns per-field confidence values (0–1). The question was: how do you surface these without overwhelming reviewers? We settled on a three-tier system — green (auto-accept ≥ 0.92), amber (review 0.75–0.92), red (flag < 0.75) — displayed as inline badge chips. This let reviewers focus attention rather than scanning every field.
Streaming status without polling. The original design had the frontend polling a status endpoint every 2 seconds. I replaced this with an SSE endpoint in the BFF that pushed events as the pipeline progressed. The result was a live progress stepper (Upload → OCR → Enrichment → Ready) with no polling overhead and much faster perceived completion.
LiteLLM as a gateway. We needed to swap LLM providers without changing application code. Routing everything through LiteLLM gave us provider-agnostic prompting and a single place to manage rate limits, fallbacks, and cost tracking.
Impact & Outcomes
- Confidence scoring UI meant reviewers could focus on flagged fields rather than auditing every extracted value
- SSE-based pipeline status significantly reduced perceived latency compared to the prior polling approach
- Langfuse tracing gave the team visibility into LLM calls in production — faster to diagnose extraction failures
- Established the LLM integration pattern (BFF → LiteLLM → Langfuse) that was reused on the next project
Visualization
Animated pipeline diagram coming in Phase 3 — will show document tokens flowing from S3 → Textract → LiteLLM → Redis → UI in real time.