How It Works

RAG Engine transforms your documents into a searchable knowledge base through a six-stage pipeline.

The Ingestion Pipeline

When you upload a document, RAG Engine processes it through these stages:

1

Upload

The file is uploaded to durable storage. A source record is created in the database with status queued, and an ingestion job is placed on the processing queue.

Deduplication: Files are hashed (SHA-256) before upload. If the same file is uploaded again, the existing source is re-processed rather than creating a duplicate.

2

Parse

A background worker picks up the job and downloads the file. The appropriate parser is selected based on the file's MIME type:

PDF Page-by-page text extraction
DOCX Paragraph extraction with styles
HTML Tag stripping, content extraction
CSV Row-by-row with headers
JSON Key-value flattening
Images OCR text extraction
3

Chunk

The extracted text is split into smaller passages (chunks) optimized for embedding and retrieval. The chunker uses token-aware splitting with configurable parameters:

Parameter Default Description
max_tokens 512 Maximum tokens per chunk
overlap_tokens 50 Token overlap between adjacent chunks

Overlap ensures that information at chunk boundaries is not lost. Each chunk retains a reference to its source document and position.

4

Embed

Each chunk is converted into a 1024-dimensional vector using an embedding model. Embeddings are generated in batches for efficiency.

Dimensions

1024

Batch size

25 chunks

5

Index

Chunks and their embedding vectors are stored in PostgreSQL with the pgvector extension. Each chunk is associated with its source document, tenant, and position metadata. The source status updates to ready.

6

Search

When you (or an agent) runs a query, the query text is embedded using the same model. pgvector performs a cosine similarity search across all indexed chunks for the tenant, returning the top-k most relevant passages with their similarity scores and source metadata.

-- Simplified query (actual uses pgvector operators)
SELECT chunk_text, source_id, similarity_score
FROM chunks
ORDER BY embedding <=> query_embedding
LIMIT 5;

Sync vs Async Ingestion

RAG Engine supports two ingestion paths:

Synchronous (Text)

POST /api/rag/sources/ingest

Plain text content is parsed, chunked, embedded, and indexed in a single request. Returns the source ID and chunk count immediately.

Asynchronous (File Upload)

POST /api/rag/sources/upload

Files are stored and queued for background processing (HTTP 202). Poll GET /sources/{source_id} for status updates.

Storage vs RAG Engine

MCP Gateway Pro includes both Media & Storage (raw file access) and RAG Engine (knowledge base). They solve different problems:

Storage RAG Engine
Purpose Raw file access for agents Semantic search across content
Processing Files stored as-is Parsed, chunked, embedded, indexed
Agent access Read full file contents Search for relevant passages
Best for Reference docs, images, videos Large doc collections, Q&A
MCP tools media__list_media, media__get_media knowledge_query, knowledge_ingest, ...

You can use both at the same time. Store reference files in Storage for direct access, and index your knowledge base documents in RAG Engine for semantic search.