AS
Andreas Sapountzis

AI Engineer

I build AI systems that actually work

Five years shipping production LLM ops, agents, and ML pipelines at scale. From neuroscience research to multi-modal NPC engines. I solve hard problems.

Worked with

Selected work

Outsmarting Microsoft's $500/Month Tax

Einbliq's data pipeline uses Azure Durable Functions—long-running tasks that can take 30+ minutes. The problem: KEDA (Kubernetes autoscaling) only sees the entry point, not the actual work happening inside. So it thinks nothing is running and kills the whole thing mid-execution. Microsoft's "official" solution? A $500/month integration. For a pipeline that costs $10/month to run.

I built a workaround. Added a "Keep Alive" endpoint that each task pings via HTTP every few minutes. Now KEDA sees constant traffic, knows work is happening, and doesn't kill anything. Same result, zero extra cost. Microsoft gets nothing.

Azure Durable Functions KEDA Cost optimization

Making Everything Fast and Cheap

Beyond the KEDA hack, I redesigned the entire pipeline stack. Parallelized data loaders, moved from App Service Plans (always-on, expensive) to Container Apps (scales to zero, pay only when running), optimized queries with lazy evaluation to fetch less data, tuned memory allocation settings.

The hardware improved too—moved from a resource-constrained B3 dev plan to proper specs. Everything got faster, and because we scale to zero when idle, costs dropped dramatically.

Result: 86% cost reduction, 88% latency improvement.

Polars Container Apps Lazy evaluation Infrastructure

Discovering Hidden Streaming Patterns

Einbliq's data had signal, but it was not clear what patterns mattered.

I approached it differently: labeled session-level data from multiple angles (QoE issues, error codes, playback duration, etc.) and used machine learning not to predict, but to discover, finding the natural splits and rules that separate good sessions from problematic ones. It's pattern mining, not classification. This revealed dozens of real insights about what causes streaming issues.

ML Pattern discovery Statistical analysis

Validating Patterns at Scale with AI

The patterns kept multiplying. Too many for engineers to manually validate. So I built an LLM agent that acts as a gatekeeper. It gets access to raw session data, evaluates each detected pattern for significance, and only escalates the ones that actually matter.

Instead of engineers drowning in alerts, they get a curated list of real, actionable issues worth investigating.

Result: 90% reduction in triage effort.

LLM agents RAG Pattern validation

Measuring Environmental Impact of Streaming

No one was tracking the carbon footprint of digital streaming at device-level granularity. I pioneered the first model to measure environmental impact across a distribution network of 1M+ user devices. Built end-to-end analytics pipeline, created Grafana dashboards for real-time visualization, and enabled content providers to make data-driven sustainability decisions.

Result: Industry-first environmental analytics with unprecedented detail.

Big Data Grafana Analytics

Fixing Therapist Note Generation

Mentalyc's AI-generated therapy notes were inconsistent and therapists were rejecting outputs. I rebuilt their LLM operations from the ground up: tightened prompts, improved retrieval pipelines, added guardrails, and created evaluation harnesses with real therapist feedback loops. Built curated datasets for regression testing and implemented cost controls through intelligent model routing and token-aware truncation. All without sacrificing quality.

Result: 30% revenue impact through higher note acceptance rates.

LLM ops Prompt engineering Eval harnesses Cost optimization

Building NPCs That Feel Alive

Yumio needed game characters that could hold natural conversations with voice input, emotional responses, character-specific knowledge, and synchronized animations. I built Dreamia, a multimodal intelligence engine from scratch: integrated speech-to-text, text-to-speech, emotion classification, LLM tool-calling agents, vector databases for character memory, and safety layers. Optimized RPC and WebSocket architecture to serve thousands of concurrent players with sub-1-second p95 latency.

Result: Production-ready NPC system handling thousands of simultaneous conversations.

LLM agents STT/TTS Vector DB WebSockets

Scaling Presales Research with AI

AIREV needed to automate lead scoring and report generation at scale. I built Revolution Engine, an end-to-end text classification service for training and deploying BERT models, and PresalesAI, a GPT-4 powered research tool using chain-of-thought reasoning. Designed MapReduce-style LLM pipelines for processing long documents and structured PostgreSQL schemas for data management.

Result: 20% faster lead conversion, 35% higher sales close rates.

BERT GPT-4 LangChain MapReduce

Decoding Brain Activity at Scale

Harvard Medical School and FORTH needed to analyze communication patterns in neuronal activity, 10,000+ neurons generating 80GB+ of calcium imaging data. I led an 8-person data science team to build preprocessing pipelines, develop advanced ML algorithms for pattern detection, and create visualization tools for neuroscience researchers. Applied statistical multivariate analysis and neural networks to extract insights from massive biological datasets.

Result: Published research on spontaneous brain activity patterns.

Big Data ML Neural networks Research

Trading Crypto with Reinforcement Learning

For my master's thesis, I built an AI that learned to trade cryptocurrencies and optimize portfolios across multiple assets. Combined state-of-the-art reinforcement learning (Recurrent PPO) with financial market knowledge to create a strategy that remained profitable through both bull and bear markets. Handled dynamic asset allocation, risk management, and temporal dependencies in volatile markets.

Result: Profitable strategy across market uptrends and downtrends.

Reinforcement Learning PyTorch Time series

How I work

Production over prototypes

I build systems that scale. Everything I ship is instrumented, observable, and designed for real-world constraints. Latency budgets, cost limits, privacy requirements. If it can't run in production reliably, it's not done.

Results, not hype

I optimize for measurable outcomes. Faster pipelines, lower costs, higher acceptance rates. The tech stack matters less than whether users' lives get better and businesses hit their goals.

Deep technical ownership

From model fine-tuning to database schemas to infrastructure autoscaling. I handle the full stack. I don't just integrate APIs; I architect systems that solve the hard problems beneath the surface.

Iterate with evidence

I ship fast, measure rigorously, and iterate based on data. Evaluation harnesses, feedback loops, A/B tests, monitoring dashboards. Every decision is grounded in what actually works, not what sounds clever.

Background

Andreas Sapountzis

I'm an AI & Machine Learning Engineer with 5+ years building production systems. Started in neuroscience research analyzing brain activity patterns, then moved into streaming analytics at scale, and now focus on LLM operations and agentic systems. I've worked across healthcare, gaming, enterprise tools, and scientific research, wherever there's hard technical problems worth solving.

M.Eng. in Electrical and Computer Engineering from Aristotle University of Thessaloniki. Based remotely, working with teams worldwide.

LLM systems Agents & RAG ML pipelines Data engineering Cloud infrastructure

What drives me

  • Reliability over novelty — Production systems that just work.
  • Make it observable — If you can't measure it, you can't improve it.
  • Respect constraints — Budget, latency, privacy matter.

Get in touch

Working on something interesting? Drop me a line.