MVP to Mission-Critical: The Idea Behind My LLM Gateway Rewrite

23 Haziran 2026 19:09

2 dk

Velqor Inc

My first LLM gateway had a bug that could have leaked user data across tenants.

So I rebuilt the whole thing. 👇

v0.1.0 was a classic MVP: FastAPI + LiteLLM, PII scrubbing, semantic caching. It worked — for one user, in a perfect world.

Then I looked at it through a production lens: ❌ Shared global cache → cross-tenant data leaks ❌ Static API keys → no real identity ❌ Zero budget controls → one bad client = $$$ gone ❌ 200ms safety overhead → developers will bypass you

Every one of those had the same root cause: the gateway had no concept of identity. It was built for a user, not users. v0.2.0 ships with: ✅ JWT-based multi-tenancy with isolated Redis Stack indices per tenant ✅ Hybrid inference — vLLM (PagedAttention) for self-hosted, LiteLLM for cloud routing ✅ Atomic rate limiting + USD budget caps via Redis Lua scripts ✅ <5ms p50 overhead — safety layers fit inside the budget ✅ 225+ tests — because production reliability is a silent feature

Think of it as an open-source alternative to Portkey or Kong AI Gateway — built around multi-tenant isolation and sub-5ms overhead.

The biggest lesson? Moving from MVP to production isn't about adding features. It's about adding constraints.

Full architecture deep dive linked in the comments 👇

Would love feedback from anyone shipping LLM infrastructure.

Full write-up here: https://orhunkupeli.hashnode.dev/mvp-to-mission-critical-the-idea-behind-my-llm-gateway-rewrite

Anahtar Kelimeler

#LLM #AIEngineering #SystemDesign #MLOps #Backend