Building NPCs That Feel Alive

A multimodal NPC engine with real-time voice, emotion, memory, and sub-second latency at scale.

The problem

Yumio needed game characters that could hold natural conversations with voice input, emotional responses, character-specific knowledge, and synchronized animations.

The hard part wasn't a demo — it was building an end-to-end system that could serve thousands of concurrent players with production-grade latency and stability.

What I built / changed

Integrated speech-to-text and text-to-speech for real-time voice conversations.
Added emotion classification to drive character reactions and pacing.
Built tool-calling agent workflows for controllable behavior.
Implemented a vector database memory layer for character context and recall.
Designed safety layers for user inputs and outputs.
Optimized RPC and WebSocket architecture to maintain sub-1-second p95 latency at scale.

Result

Production-ready NPC system handling thousands of simultaneous conversations with sub-1-second p95 latency.

Stack / concepts

LLM agentsSTT/TTSVector DBWebSockets

Interested in discussing engineering challenges?

Get in touch