How It Works
RAG Engine transforms your documents into a searchable knowledge base through a six-stage pipeline.
The Ingestion Pipeline
When you upload a document, RAG Engine processes it through these stages:
Upload
The file is uploaded to durable storage. A source record is created in the database with status queued, and an ingestion job is placed on the processing queue.
Deduplication: Files are hashed (SHA-256) before upload. If the same file is uploaded again, the existing source is re-processed rather than creating a duplicate.
Parse
A background worker picks up the job and downloads the file. The appropriate parser is selected based on the file's MIME type:
Chunk
The extracted text is split into smaller passages (chunks) optimized for embedding and retrieval. The chunker uses token-aware splitting with configurable parameters:
| Parameter | Default | Description |
|---|---|---|
| max_tokens | 512 | Maximum tokens per chunk |
| overlap_tokens | 50 | Token overlap between adjacent chunks |
Overlap ensures that information at chunk boundaries is not lost. Each chunk retains a reference to its source document and position.
Embed
Each chunk is converted into a 1024-dimensional vector using an embedding model. Embeddings are generated in batches for efficiency.
1024
25 chunks
Index
Chunks and their embedding vectors are stored in PostgreSQL with the pgvector extension. Each chunk is associated with its source document, tenant, and position metadata. The source status updates to ready.
Search
When you (or an agent) runs a query, the query text is embedded using the same model. pgvector performs a cosine similarity search across all indexed chunks for the tenant, returning the top-k most relevant passages with their similarity scores and source metadata.
-- Simplified query (actual uses pgvector operators)
SELECT chunk_text, source_id, similarity_score
FROM chunks
ORDER BY embedding <=> query_embedding
LIMIT 5; Sync vs Async Ingestion
RAG Engine supports two ingestion paths:
Synchronous (Text)
POST /api/rag/sources/ingest
Plain text content is parsed, chunked, embedded, and indexed in a single request. Returns the source ID and chunk count immediately.
Asynchronous (File Upload)
POST /api/rag/sources/upload
Files are stored and queued for background processing (HTTP 202). Poll GET /sources/{source_id} for status updates.
Storage vs RAG Engine
MCP Gateway Pro includes both Media & Storage (raw file access) and RAG Engine (knowledge base). They solve different problems:
| Storage | RAG Engine | |
|---|---|---|
| Purpose | Raw file access for agents | Semantic search across content |
| Processing | Files stored as-is | Parsed, chunked, embedded, indexed |
| Agent access | Read full file contents | Search for relevant passages |
| Best for | Reference docs, images, videos | Large doc collections, Q&A |
| MCP tools | media__list_media, media__get_media | knowledge_query, knowledge_ingest, ... |
You can use both at the same time. Store reference files in Storage for direct access, and index your knowledge base documents in RAG Engine for semantic search.