RAG API
Index workspace files and retrieve relevant code chunks using hybrid BM25 + vector search powered by PostgreSQL and pgvector.
Overview
The RAG (Retrieval-Augmented Generation) API indexes your workspace files and retrieves the most relevant code chunks before each agent response. This gives the agent accurate, up-to-date knowledge of your codebase without consuming the full context window.
Architecture
Workspace files
│
▼
tree-sitter chunker
├── L1: file-level summary
├── L2: symbol-level (functions, classes, types)
└── L3: sub-block (branches, loops, expressions)
│
▼
BGE-small embeddings + BM25 full-text index
│ │
└──────── hybrid search ─┘
│
▼
ranked chunk results
Storage: PostgreSQL with pgvector extension. Hybrid retrieval combines dense vector similarity with BM25 lexical scoring for best-of-both results.
Index Files
/rag/indexIndex workspace files into the RAG store. Re-indexing a file that was previously indexed replaces the old chunks automatically.
Request body
workspace_idstringbodyrequiredUnique identifier for the workspace.
filesarraybodyrequiredFiles to index. Each item has path (string) and content (string).
Response fields
indexednumberNumber of files indexed.
chunks_creatednumberNumber of chunks created across the indexed files.
duration_msnumberIndexing duration in milliseconds.
Limits
- Max files per request: 200
- Max file size: 512 KB
- Max chunks per workspace: 10,000
POST /rag/index
Authorization: Bearer YOUR_API_KEY
Content-Type: application/json
{
"workspace_id": "my-project",
"files": [
{
"path": "src/auth/session.ts",
"content": "import { cookies } from 'next/headers';\n..."
},
{
"path": "src/lib/db.ts",
"content": "import { createClient } from '@supabase/supabase-js';\n..."
}
]
}{
"indexed": 2,
"chunks_created": 47,
"duration_ms": 312
}Re-indexing a file that was previously indexed replaces the old chunks automatically. You do not need to delete before re-indexing.
Query
/rag/queryRetrieve the most relevant code chunks for a natural language or code query.
Request body
workspace_idstringbodyrequiredWorkspace to search.
querystringbodyrequiredNatural language or code query.
modestringbodydefault: "hybrid"Retrieval mode: hybrid, vector, or lexical.
limitnumberbodydefault: 10Maximum number of chunks to return.
Response fields
chunksarrayRanked chunk results. Each chunk includes path, content, level, score, start_line, and end_line.
query_msnumberQuery duration in milliseconds.
POST /rag/query
Authorization: Bearer YOUR_API_KEY
Content-Type: application/json
{
"workspace_id": "my-project",
"query": "how are user sessions validated",
"mode": "hybrid",
"limit": 10
}{
"chunks": [
{
"path": "src/auth/session.ts",
"content": "export async function validateSession(token: string) {",
"level": "L2",
"score": 0.94,
"start_line": 12,
"end_line": 34
}
],
"query_ms": 18
}Retrieval Modes
| Mode | Description | Best For |
|---|---|---|
hybrid | BM25 + vector, scores merged with RRF | General use — best accuracy |
vector | Dense embedding similarity only | Semantic / conceptual queries |
lexical | BM25 full-text only | Exact identifier or keyword search |
Status
/rag/statusReturn indexing statistics for a workspace.
Request body
workspace_idstringbodyrequiredWorkspace to report on.
Response fields
workspace_idstringThe workspace identifier.
chunk_countnumberTotal number of chunks stored for the workspace.
file_countnumberNumber of indexed files.
last_indexedstringISO-8601 timestamp of the last indexing operation.
index_size_kbnumberSize of the index in kilobytes.
POST /rag/status
Authorization: Bearer YOUR_API_KEY
Content-Type: application/json
{ "workspace_id": "my-project" }{
"workspace_id": "my-project",
"chunk_count": 1842,
"file_count": 63,
"last_indexed": "2026-03-21T14:32:00Z",
"index_size_kb": 4096
}The VS Code extension handles indexing automatically when misar.autoContext is enabled. Use the API directly when building custom tooling or CI pipelines that need to pre-index large workspaces.