Back to Work
Building NPCs That Feel Alive
A multimodal NPC engine with real-time voice, emotion, memory, and sub-second latency at scale.
The problem
Yumio needed game characters that could hold natural conversations with voice input, emotional responses, character-specific knowledge, and synchronized animations.
The hard part wasn't a demo — it was building an end-to-end system that could serve thousands of concurrent players with production-grade latency and stability.
What I built / changed
- Integrated speech-to-text and text-to-speech for real-time voice conversations.
- Added emotion classification to drive character reactions and pacing.
- Built tool-calling agent workflows for controllable behavior.
- Implemented a vector database memory layer for character context and recall.
- Designed safety layers for user inputs and outputs.
- Optimized RPC and WebSocket architecture to maintain sub-1-second p95 latency at scale.
Result
Production-ready NPC system handling thousands of simultaneous conversations with sub-1-second p95 latency.
Stack / concepts
LLM agentsSTT/TTSVector DBWebSockets
Interested in discussing engineering challenges?
Get in touch