RAG API
Index workspace files and retrieve relevant code chunks using hybrid BM25 + vector search powered by PostgreSQL and pgvector.
Overview
The RAG (Retrieval-Augmented Generation) API indexes your workspace files and retrieves the most relevant code chunks before each agent response. This gives the agent accurate, up-to-date knowledge of your codebase without consuming the full context window.
Architecture
Workspace files
│
▼
tree-sitter chunker
├── L1: file-level summary
├── L2: symbol-level (functions, classes, types)
└── L3: sub-block (branches, loops, expressions)
│
▼
BGE-small embeddings + BM25 full-text index
│ │
└──────── hybrid search ─┘
│
▼
ranked chunk results
Storage: PostgreSQL with pgvector extension. Hybrid retrieval combines dense vector similarity with BM25 lexical scoring for best-of-both results.
Index Files
POST /rag/index
Authorization: Bearer YOUR_API_KEY
Content-Type: application/json
Request
{
"workspace_id": "my-project",
"files": [
{
"path": "src/auth/session.ts",
"content": "import { cookies } from 'next/headers';\n..."
},
{
"path": "src/lib/db.ts",
"content": "import { createClient } from '@supabase/supabase-js';\n..."
}
]
}
| Field | Type | Description |
|-------|------|-------------|
| workspace_id | string | Unique identifier for the workspace |
| files | array | Files to index. Each item has path (string) and content (string) |
Limits
| Limit | Value | |-------|-------| | Max files per request | 200 | | Max file size | 512 KB | | Max chunks per workspace | 10,000 |
Response
{
"indexed": 2,
"chunks_created": 47,
"duration_ms": 312
}
Re-indexing a file that was previously indexed replaces the old chunks automatically. You do not need to delete before re-indexing.
Query
POST /rag/query
Authorization: Bearer YOUR_API_KEY
Content-Type: application/json
Request
{
"workspace_id": "my-project",
"query": "how are user sessions validated",
"mode": "hybrid",
"limit": 10
}
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| workspace_id | string | — | Workspace to search |
| query | string | — | Natural language or code query |
| mode | string | "hybrid" | Retrieval mode: hybrid, vector, or lexical |
| limit | number | 10 | Maximum number of chunks to return |
Retrieval Modes
| Mode | Description | Best For |
|------|-------------|----------|
| hybrid | BM25 + vector, scores merged with RRF | General use — best accuracy |
| vector | Dense embedding similarity only | Semantic / conceptual queries |
| lexical | BM25 full-text only | Exact identifier or keyword search |
Response
{
"chunks": [
{
"path": "src/auth/session.ts",
"content": "export async function validateSession(token: string) {",
"level": "L2",
"score": 0.94,
"start_line": 12,
"end_line": 34
}
],
"query_ms": 18
}
Status
POST /rag/status
Authorization: Bearer YOUR_API_KEY
Content-Type: application/json
{ "workspace_id": "my-project" }
Response
{
"workspace_id": "my-project",
"chunk_count": 1842,
"file_count": 63,
"last_indexed": "2026-03-21T14:32:00Z",
"index_size_kb": 4096
}
The VS Code extension handles indexing automatically when misar.autoContext is enabled. Use the API directly when building custom tooling or CI pipelines that need to pre-index large workspaces.