Observability API
Monitor provider health, rate limits, token usage, cost, latency, and full request traces in real time.
Overview
The Observability API gives you live visibility into how the backend is performing — circuit breaker states, rate limit headroom, per-model metrics, and full execution traces for debugging.
All endpoints require authentication:
Authorization: Bearer YOUR_API_KEY
Circuit Breaker State
/circuit-stateReturns the circuit breaker state for each configured AI provider.
Response fields
providersobjectMap of provider name to circuit info: state, failure_count, last_failure, and retry_after (when open).
Circuit state values:
closed— Provider healthy; requests flow normally.open— Provider failing; requests are blocked untilretry_after.half_open— Testing recovery; one probe request allowed through.
{
"providers": {
"gemini": { "state": "closed", "failure_count": 0, "last_failure": null },
"groq": { "state": "open", "failure_count": 5, "last_failure": "2026-03-21T14:55:00Z", "retry_after": "2026-03-21T15:00:00Z" },
"mistral":{ "state": "half_open", "failure_count": 2, "last_failure": "2026-03-21T14:58:00Z" }
}
}Rate Limit Status
/rate-statusReturns remaining request and token budget per provider with reset times.
Response fields
providersobjectMap of provider name to rate info: requests_remaining, tokens_remaining, and reset_at.
{
"providers": {
"gemini": { "requests_remaining": 580, "tokens_remaining": 950000, "reset_at": "2026-03-21T15:00:00Z" },
"groq": { "requests_remaining": 28, "tokens_remaining": 120000, "reset_at": "2026-03-21T14:59:00Z" }
}
}Combined Provider Status
/providers/statusSingle endpoint combining circuit state and rate limits for all providers.
Response fields
providersobjectMap of provider name to combined info: circuit, requests_remaining, tokens_remaining, healthy.
healthy_countnumberNumber of currently healthy providers.
total_countnumberTotal number of configured providers.
{
"providers": {
"gemini": {
"circuit": "closed",
"requests_remaining": 580,
"tokens_remaining": 950000,
"healthy": true
}
},
"healthy_count": 5,
"total_count": 7
}Aggregated Metrics
/metricsReturns aggregated request metrics for the current period.
Response fields
periodstringAggregation window, e.g. 1h.
total_requestsnumberTotal requests in the period.
error_ratenumberFraction of requests that errored.
latencyobjectLatency percentiles: p50_ms, p95_ms.
by_modelobjectPer-model breakdown: requests, errors, p50_ms.
{
"period": "1h",
"total_requests": 1240,
"error_rate": 0.012,
"latency": {
"p50_ms": 420,
"p95_ms": 1840
},
"by_model": {
"gemini-2.0-flash": { "requests": 830, "errors": 8, "p50_ms": 380 },
"deepseek-chat": { "requests": 410, "errors": 7, "p50_ms": 510 }
}
}Usage Dashboard
/usageReal-time per-model usage breakdown including token counts, cost, and latency.
Response fields
windowstringReporting window, e.g. 24h.
modelsArray<object>Per-model usage: model, requests, input_tokens, output_tokens, cost_usd, avg_latency_ms.
totalsobjectAggregate totals: requests, input_tokens, output_tokens, cost_usd.
{
"window": "24h",
"models": [
{
"model": "gemini-2.0-flash",
"requests": 2840,
"input_tokens": 4200000,
"output_tokens": 980000,
"cost_usd": 1.24,
"avg_latency_ms": 395
}
],
"totals": {
"requests": 3100,
"input_tokens": 4600000,
"output_tokens": 1050000,
"cost_usd": 1.87
}
}Request History
/historyReturns a summary of the last N requests — model used, token counts, latency, and outcome.
Query parameters
limitnumberqueryNumber of recent requests to return.
Response fields
requestsArray<object>Each entry includes request_id, model, input_tokens, output_tokens, latency_ms, status, timestamp.
{
"requests": [
{
"request_id": "req_abc123",
"model": "gemini-2.0-flash",
"input_tokens": 1240,
"output_tokens": 387,
"latency_ms": 412,
"status": "success",
"timestamp": "2026-03-21T14:59:01Z"
}
]
}Request Trace
/traceFull execution DAG for a single request — every model call, tool call, retrieval, and ensemble vote recorded as a node.
Query parameters
request_idstringqueryrequiredThe request to trace.
Response fields
request_idstringThe traced request id.
duration_msnumberTotal request duration in milliseconds.
nodesArray<object>Execution nodes, each with id, type, label, duration_ms, and parent.
{
"request_id": "req_abc123",
"duration_ms": 1240,
"nodes": [
{ "id": "n1", "type": "retrieval", "label": "RAG query", "duration_ms": 18 },
{ "id": "n2", "type": "model_call", "label": "gemini-2.0-flash", "duration_ms": 412, "parent": "n1" },
{ "id": "n3", "type": "tool_call", "label": "read_file", "duration_ms": 3, "parent": "n2" },
{ "id": "n4", "type": "model_call", "label": "gemini-2.0-flash", "duration_ms": 807, "parent": "n3" }
]
}Recent Traces
/tracesSummaries of the most recent request traces — useful for spotting slow or failed requests at a glance.
Query parameters
limitnumberqueryNumber of recent traces to return.
curl https://api.misar.dev/traces?limit=10 \
-H "Authorization: Bearer YOUR_API_KEY"Available Models
/modelsLists all configured models with current health status.
Response fields
modelsArray<object>Each entry includes id, provider, healthy, latency_p50_ms, and circuit (when unhealthy).
{
"models": [
{ "id": "gemini-2.0-flash", "provider": "gemini", "healthy": true, "latency_p50_ms": 380 },
{ "id": "deepseek-chat", "provider": "deepseek","healthy": true, "latency_p50_ms": 510 },
{ "id": "groq-llama", "provider": "groq", "healthy": false, "circuit": "open" }
]
}Full Catalog
/models/catalogComplete model catalog with capability metadata.
Response fields
modelsArray<object>Each entry includes id, provider, context_window, speed, quality, cost_per_1m_input_tokens_usd, cost_per_1m_output_tokens_usd, supports_tools, supports_vision.
{
"models": [
{
"id": "gemini-2.0-flash",
"provider": "gemini",
"context_window": 1000000,
"speed": "fast",
"quality": "high",
"cost_per_1m_input_tokens_usd": 0.10,
"cost_per_1m_output_tokens_usd": 0.40,
"supports_tools": true,
"supports_vision": true
}
]
}Poll /providers/status in your dashboard to surface provider degradation to users before they hit errors. Circuit breaker state changes are a leading indicator of upstream issues.