Skip to Content

Monitoring and Observability for LLM Applications

Monitoring and Observability for LLM Applications

What to Monitor

Performance Metrics

  • Latency: Time to first token (TTFT), total response time, streaming throughput
  • Throughput: Requests per second, tokens per second
  • Error rates: API errors, timeouts, rate limits hit
  • Availability: Uptime of your service and upstream LLM APIs

Quality Metrics

  • User satisfaction: Thumbs up/down, NPS scores, task completion rates
  • Automated quality: LLM-as-judge scoring on random samples
  • Hallucination rate: How often does the model generate incorrect information?
  • Relevance: Are responses actually answering the user's question?

Cost Metrics

  • Tokens consumed (input vs. output) per request, user, and feature
  • Cost per successful interaction
  • Cache hit rates and savings

Observability Tools

ToolFocusKey Feature
LangSmithLLM tracingFull trace visualization, evaluation
HeliconeLLM proxyLogging, caching, rate limiting
LangfuseOpen-sourceTracing, scoring, prompt management
Datadog/GrafanaInfrastructureDashboards, alerts, APM integration
Arize PhoenixML observabilityEmbedding drift, retrieval analysis

Alerting Strategy

  • P0 (immediate): Service down, error rate > 10%, LLM API unreachable
  • P1 (urgent): Latency spike > 2x baseline, cost anomaly, quality drop
  • P2 (next business day): Cache hit rate declining, approaching rate limits

Best Practice: Trace Everything

For every LLM interaction, log: timestamp, user ID, model used, full prompt (if safe), completion, token counts, latency, tool calls, and any errors. This data is invaluable for debugging, optimization, and compliance.

🌼 Daisy+ in Action: Full Observability

Daisy+ monitors AI operations through Odoo's built-in logging, Railway's deployment metrics, and structured webhook events. Every digital employee action creates an auditable trail — messages in Discuss, notes on tasks, email logs — making it easy to review what the AI did and why. When something goes wrong, you can trace the exact sequence: the incoming trigger, the AI's reasoning, the tool calls it made, and the final outcome.

Rating
0 0

There are no comments for now.

to be the first to leave a comment.