Large Language Models: From Fundamentals to Production

0 %

Course content

Monitoring and Observability for LLM Applications

Prev Next

Fullscreen Share

Monitoring and Observability for LLM Applications

What to Monitor

Performance Metrics

Latency: Time to first token (TTFT), total response time, streaming throughput
Throughput: Requests per second, tokens per second
Error rates: API errors, timeouts, rate limits hit
Availability: Uptime of your service and upstream LLM APIs

Quality Metrics

User satisfaction: Thumbs up/down, NPS scores, task completion rates
Automated quality: LLM-as-judge scoring on random samples
Hallucination rate: How often does the model generate incorrect information?
Relevance: Are responses actually answering the user's question?

Cost Metrics

Tokens consumed (input vs. output) per request, user, and feature
Cost per successful interaction
Cache hit rates and savings

Observability Tools

Tool	Focus	Key Feature
LangSmith	LLM tracing	Full trace visualization, evaluation
Helicone	LLM proxy	Logging, caching, rate limiting
Langfuse	Open-source	Tracing, scoring, prompt management
Datadog/Grafana	Infrastructure	Dashboards, alerts, APM integration
Arize Phoenix	ML observability	Embedding drift, retrieval analysis

Alerting Strategy

P0 (immediate): Service down, error rate > 10%, LLM API unreachable
P1 (urgent): Latency spike > 2x baseline, cost anomaly, quality drop
P2 (next business day): Cache hit rate declining, approaching rate limits

Best Practice: Trace Everything

For every LLM interaction, log: timestamp, user ID, model used, full prompt (if safe), completion, token counts, latency, tool calls, and any errors. This data is invaluable for debugging, optimization, and compliance.

🌼 Daisy+ in Action: Full Observability

Daisy+ monitors AI operations through Odoo's built-in logging, Railway's deployment metrics, and structured webhook events. Every digital employee action creates an auditable trail — messages in Discuss, notes on tasks, email logs — making it easy to review what the AI did and why. When something goes wrong, you can trace the exact sequence: the incoming trigger, the AI's reasoning, the tool calls it made, and the final outcome.

Large Language Models: From Fundamentals to Production

Completed

Monitoring and Observability for LLM Applications

Monitoring and Observability for LLM Applications

What to Monitor

Performance Metrics

Quality Metrics

Cost Metrics

Observability Tools

Alerting Strategy

Best Practice: Trace Everything

🌼 Daisy+ in Action: Full Observability

Follow us