Observability API
Monitor provider health, rate limits, token usage, cost, latency, and full request traces in real time.
Overview
The Observability API gives you live visibility into how the backend is performing — circuit breaker states, rate limit headroom, per-model metrics, and full execution traces for debugging.
All endpoints require authentication:
Authorization: Bearer YOUR_API_KEY
Provider Health
Circuit Breaker State
GET /circuit-state
Returns the circuit breaker state for each configured AI provider.
{
"providers": {
"gemini": { "state": "closed", "failure_count": 0, "last_failure": null },
"groq": { "state": "open", "failure_count": 5, "last_failure": "2026-03-21T14:55:00Z", "retry_after": "2026-03-21T15:00:00Z" },
"mistral":{ "state": "half_open", "failure_count": 2, "last_failure": "2026-03-21T14:58:00Z" }
}
}
| State | Meaning |
|-------|---------|
| closed | Provider healthy — requests flow normally |
| open | Provider failing — requests are blocked until retry_after |
| half_open | Testing recovery — one probe request allowed through |
Rate Limit Status
GET /rate-status
{
"providers": {
"gemini": { "requests_remaining": 580, "tokens_remaining": 950000, "reset_at": "2026-03-21T15:00:00Z" },
"groq": { "requests_remaining": 28, "tokens_remaining": 120000, "reset_at": "2026-03-21T14:59:00Z" }
}
}
Combined Provider Status
GET /providers/status
Single endpoint combining circuit state and rate limits for all providers.
{
"providers": {
"gemini": {
"circuit": "closed",
"requests_remaining": 580,
"tokens_remaining": 950000,
"healthy": true
}
},
"healthy_count": 5,
"total_count": 7
}
Metrics
Aggregated Metrics
GET /metrics
{
"period": "1h",
"total_requests": 1240,
"error_rate": 0.012,
"latency": {
"p50_ms": 420,
"p95_ms": 1840
},
"by_model": {
"gemini-2.0-flash": { "requests": 830, "errors": 8, "p50_ms": 380 },
"deepseek-chat": { "requests": 410, "errors": 7, "p50_ms": 510 }
}
}
Usage Dashboard
GET /usage
Real-time per-model usage breakdown including token counts, cost, and latency.
{
"window": "24h",
"models": [
{
"model": "gemini-2.0-flash",
"requests": 2840,
"input_tokens": 4200000,
"output_tokens": 980000,
"cost_usd": 1.24,
"avg_latency_ms": 395
}
],
"totals": {
"requests": 3100,
"input_tokens": 4600000,
"output_tokens": 1050000,
"cost_usd": 1.87
}
}
Request History
GET /history?limit=20
Returns a summary of the last N requests — model used, token counts, latency, and outcome.
{
"requests": [
{
"request_id": "req_abc123",
"model": "gemini-2.0-flash",
"input_tokens": 1240,
"output_tokens": 387,
"latency_ms": 412,
"status": "success",
"timestamp": "2026-03-21T14:59:01Z"
}
]
}
Traces
Request Trace
GET /trace?request_id=req_abc123
Full execution DAG for a single request — every model call, tool call, retrieval, and ensemble vote recorded as a node.
{
"request_id": "req_abc123",
"duration_ms": 1240,
"nodes": [
{ "id": "n1", "type": "retrieval", "label": "RAG query", "duration_ms": 18 },
{ "id": "n2", "type": "model_call", "label": "gemini-2.0-flash", "duration_ms": 412, "parent": "n1" },
{ "id": "n3", "type": "tool_call", "label": "read_file", "duration_ms": 3, "parent": "n2" },
{ "id": "n4", "type": "model_call", "label": "gemini-2.0-flash", "duration_ms": 807, "parent": "n3" }
]
}
Recent Traces
GET /traces?limit=10
Summaries of the most recent request traces — useful for spotting slow or failed requests at a glance.
Model Catalog
Available Models
GET /models
Lists all configured models with current health status.
{
"models": [
{ "id": "gemini-2.0-flash", "provider": "gemini", "healthy": true, "latency_p50_ms": 380 },
{ "id": "deepseek-chat", "provider": "deepseek","healthy": true, "latency_p50_ms": 510 },
{ "id": "groq-llama", "provider": "groq", "healthy": false, "circuit": "open" }
]
}
Full Catalog
GET /models/catalog
Complete model catalog with capability metadata.
{
"models": [
{
"id": "gemini-2.0-flash",
"provider": "gemini",
"context_window": 1000000,
"speed": "fast",
"quality": "high",
"cost_per_1m_input_tokens_usd": 0.10,
"cost_per_1m_output_tokens_usd": 0.40,
"supports_tools": true,
"supports_vision": true
}
]
}
Poll /providers/status in your dashboard to surface provider degradation to users before they hit errors. Circuit breaker state changes are a leading indicator of upstream issues.