Architecture
High-level overview of how Misar Code's VS Code extension, backend, RAG pipeline, and tool execution model fit together.
System Overview
┌─────────────────────────────────────────────────────────────┐
│ VS Code Extension │
│ │
│ ┌──────────────┐ ┌─────────────┐ ┌───────────────────┐ │
│ │ Chat Panel │ │Inline Compl.│ │ Diff / CodeLens │ │
│ └──────┬───────┘ └──────┬──────┘ └─────────┬─────────┘ │
│ │ │ │ │
│ └─────────────────┴────────────────────┘ │
│ │ │
│ WebSocket (wss://api.misar.dev/ws/chat/v2) │
│ │ │
└───────────────────────────┼─────────────────────────────────┘
│
┌───────────────────────────▼─────────────────────────────────┐
│ FastAPI Backend │
│ │
│ ┌──────────────────┐ ┌──────────────────────────────┐ │
│ │ Multi-Model │ │ RAG Pipeline │ │
│ │ Router │ │ ┌──────────────────────┐ │ │
│ │ ┌────────────┐ │ │ │ tree-sitter chunker │ │ │
│ │ │ Cascading │ │ │ │ BGE-small embeddings │ │ │
│ │ │ fallback │ │ │ │ BM25 + pgvector │ │ │
│ │ └────────────┘ │ │ └──────────────────────┘ │ │
│ └────────┬─────────┘ └──────────────────────────────┘ │
│ │ │
│ ┌────────▼──────────────────────────────────────────────┐ │
│ │ Permission Gate │ │
│ │ dangerous tools → require approval │ │
│ │ read tools → parallel execution │ │
│ │ write tools → serial execution │ │
│ └───────────────────────────────────────────────────────┘ │
│ │
└──────────────┬──────────────────────────────────────────────┘
│
┌───────▼────────┐
│ AI Providers │
│ │
│ Gemini │
│ DeepSeek │
│ Groq │
│ Mistral │
│ OpenRouter │
└─────────────────┘
Key Components
VS Code Extension
The extension is the user-facing layer. It owns:
- The chat panel (sidebar or bottom panel) — renders streamed responses, tool progress, and diff previews
- Inline completions — intercepts VS Code's completion provider API and queries
/complete - Tool execution — when the backend requests a tool call, the extension runs it locally (file reads, writes, shell commands) and returns the result over the same WebSocket connection
- Context injection — before each message, the extension collects the active file, selected text, type definitions, and related tests and includes them in the payload
Backend
A FastAPI application that handles all AI model communication:
- WebSocket handler — maintains the agent loop, streams content blocks back to the extension
- Multi-model router — tries providers in priority order with automatic fallback on rate limits or errors. Circuit breakers prevent requests to failing providers.
- RAG injection — before forwarding a message to the model, the backend queries the vector index and prepends the top-ranked chunks to the system context
- Tool orchestration — the backend sends
tool_useevents; the extension executes them and returns results; the loop continues untilmessage_stop
RAG Pipeline
Workspace files (uploaded via POST /rag/index)
│
▼
tree-sitter parser
├── L1 chunk: file-level summary
├── L2 chunk: symbol (function / class / type)
└── L3 chunk: sub-block (branch / loop / expression)
│
├──► BGE-small embedding → pgvector (dense index)
└──► tsvector GIN index → BM25 (lexical index)
│
hybrid retrieval (RRF merge)
│
top-K ranked chunks → injected into system context
Tool Execution Model
Tools are executed in the extension, not on the server. The backend only describes what tool to call; the extension has full access to the local filesystem, terminal, and workspace.
Backend Extension
│ │
│── tool_use: read_file ──────────► │
│ │── fs.readFile() ──► disk
│ │◄─ file contents ───
│◄── tool_result ───────────────── │
│ │
Parallel vs serial execution:
- Read-only tools (
read_file,glob,search) — executed in parallel for speed - Write tools (
write_file,edit_file,run_command) — executed serially to prevent conflicts
Context Window Management
| Threshold | Action | |-----------|--------| | Under 60K tokens | No action — full context preserved | | 60K–100K tokens | Oldest turns summarised and replaced with a summary snippet | | Over 100K tokens | Hard trim — oldest turns removed after summarisation |
The agent detects when context has been compacted and fires the compact session-start hook event so hooks can re-inject critical facts.
WebSocket Reliability
The extension maintains a persistent WebSocket connection with automatic reconnection:
| Attempt | Backoff | |---------|---------| | 1 | 1 second | | 2 | 2 seconds | | 3 | 4 seconds | | 4 | 8 seconds | | 5 | 16 seconds |
After 5 failed attempts the extension surfaces a reconnection error in the chat panel. The session is preserved — messages sent while disconnected are queued and replayed on reconnect.
All tool execution happens locally in the extension process. The backend never has direct access to your filesystem or terminal — it only sends instructions and receives results.