Misar.io Documentation

High-level overview of how Misar Code's VS Code extension, backend, RAG pipeline, and tool execution model fit together.

System Overview

┌─────────────────────────────────────────────────────────────┐
│                      VS Code Extension                      │
│                                                             │
│  ┌──────────────┐  ┌─────────────┐  ┌───────────────────┐  │
│  │  Chat Panel  │  │Inline Compl.│  │  Diff / CodeLens  │  │
│  └──────┬───────┘  └──────┬──────┘  └─────────┬─────────┘  │
│         │                 │                    │            │
│         └─────────────────┴────────────────────┘            │
│                           │                                 │
│              WebSocket (wss://api.misar.dev/ws/chat/v2)     │
│                           │                                 │
└───────────────────────────┼─────────────────────────────────┘
                            │
┌───────────────────────────▼─────────────────────────────────┐
│                      FastAPI Backend                        │
│                                                             │
│  ┌──────────────────┐   ┌──────────────────────────────┐   │
│  │  Multi-Model     │   │  RAG Pipeline                │   │
│  │  Router          │   │  ┌──────────────────────┐    │   │
│  │  ┌────────────┐  │   │  │ tree-sitter chunker   │   │   │
│  │  │ Cascading  │  │   │  │ BGE-small embeddings  │   │   │
│  │  │ fallback   │  │   │  │ BM25 + pgvector       │   │   │
│  │  └────────────┘  │   │  └──────────────────────┘    │   │
│  └────────┬─────────┘   └──────────────────────────────┘   │
│           │                                                 │
│  ┌────────▼──────────────────────────────────────────────┐  │
│  │              Permission Gate                          │  │
│  │  dangerous tools → require approval                   │  │
│  │  read tools → parallel execution                      │  │
│  │  write tools → serial execution                       │  │
│  └───────────────────────────────────────────────────────┘  │
│                                                             │
└──────────────┬──────────────────────────────────────────────┘
               │
       ┌───────▼────────┐
       │   AI Providers  │
       │                 │
       │  Gemini         │
       │  DeepSeek       │
       │  Groq           │
       │  Mistral        │
       │  OpenRouter     │
       └─────────────────┘

Key Components

VS Code Extension

The extension is the user-facing layer. It owns:

The chat panel (sidebar or bottom panel) — renders streamed responses, tool progress, and diff previews
Inline completions — intercepts VS Code's completion provider API and queries /complete
Tool execution — when the backend requests a tool call, the extension runs it locally (file reads, writes, shell commands) and returns the result over the same WebSocket connection
Context injection — before each message, the extension collects the active file, selected text, type definitions, and related tests and includes them in the payload

Backend

A FastAPI application that handles all AI model communication:

WebSocket handler — maintains the agent loop, streams content blocks back to the extension
Multi-model router — tries providers in priority order with automatic fallback on rate limits or errors. Circuit breakers prevent requests to failing providers.
RAG injection — before forwarding a message to the model, the backend queries the vector index and prepends the top-ranked chunks to the system context
Tool orchestration — the backend sends tool_use events; the extension executes them and returns results; the loop continues until message_stop

RAG Pipeline

Workspace files (uploaded via POST /rag/index)
        │
        ▼
  tree-sitter parser
  ├── L1 chunk: file-level summary
  ├── L2 chunk: symbol (function / class / type)
  └── L3 chunk: sub-block (branch / loop / expression)
        │
        ├──► BGE-small embedding → pgvector (dense index)
        └──► tsvector GIN index  → BM25 (lexical index)
                    │
              hybrid retrieval (RRF merge)
                    │
              top-K ranked chunks → injected into system context

Tool Execution Model

Tools are executed in the extension, not on the server. The backend only describes what tool to call; the extension has full access to the local filesystem, terminal, and workspace.

Backend                          Extension
  │                                  │
  │── tool_use: read_file ──────────► │
  │                                  │── fs.readFile() ──► disk
  │                                  │◄─ file contents ───
  │◄── tool_result ─────────────────  │
  │                                  │

Parallel vs serial execution:

Read-only tools (read_file, glob, search) — executed in parallel for speed
Write tools (write_file, edit_file, run_command) — executed serially to prevent conflicts

Context Window Management

| Threshold | Action | |-----------|--------| | Under 60K tokens | No action — full context preserved | | 60K–100K tokens | Oldest turns summarised and replaced with a summary snippet | | Over 100K tokens | Hard trim — oldest turns removed after summarisation |

The agent detects when context has been compacted and fires the compact session-start hook event so hooks can re-inject critical facts.

WebSocket Reliability

The extension maintains a persistent WebSocket connection with automatic reconnection:

| Attempt | Backoff | |---------|---------| | 1 | 1 second | | 2 | 2 seconds | | 3 | 4 seconds | | 4 | 8 seconds | | 5 | 16 seconds |

After 5 failed attempts the extension surfaces a reconnection error in the chat panel. The session is preserved — messages sent while disconnected are queued and replayed on reconnect.

All tool execution happens locally in the extension process. The backend never has direct access to your filesystem or terminal — it only sends instructions and receives results.

Architecture