Fixing Therapist Note Generation

A production LLM ops rebuild that improved quality, reduced rejection, and delivered direct revenue impact.

The problem

Mentalyc's AI-generated therapy notes were inconsistent and therapists were rejecting outputs.

The system needed quality, reliability, and cost control — not just better prompts.

Tightened prompts and improved retrieval pipelines.
Added guardrails and structured output constraints.
Created an evaluation harness with real therapist feedback loops.
Built curated datasets for regression testing.
Implemented cost controls via intelligent model routing and token-aware truncation.

30% revenue impact through higher note acceptance rates.

LLM opsPrompt engineeringEval harnessesCost optimization

Interested in discussing engineering challenges?