From Alerting to Inference: Metrics Never Stopped Mattering

Tue, 07 Apr 2026 00:00:00 +0000

Your LLM is slow. Users are complaining. Queues are growing. Someone on the team is already profiling the model, looking at batch sizes, considering a bigger GPU.

Nine times out of ten, the answer is already in the metrics.

I’ve spent most of my career staring at metrics. Bare-metal servers, Kubernetes clusters, managed services on public cloud. And if there’s one thing I keep re-learning, it’s that the infrastructure is lying to you, and you’re not asking the right questions.

This isn’t a new lesson. Same lesson, different domain.

Kubernetes on Andrea Cervesato

From Alerting to Inference: Metrics Never Stopped Mattering