How It Works

RAG Engine transforms your documents into a searchable knowledge base through a six-stage pipeline.

The Ingestion Pipeline

When you upload a document, RAG Engine processes it through these stages:

Upload

The file is uploaded to durable storage. A source record is created in the database with status queued, and an ingestion job is placed on the processing queue.

Deduplication: Files are hashed (SHA-256) before upload. If the same file is uploaded again, the existing source is re-processed rather than creating a duplicate.

Parse

A background worker picks up the job and downloads the file. The appropriate parser is selected based on the file's MIME type:

PDF Page-by-page text extraction

DOCX Paragraph extraction with styles

HTML Tag stripping, content extraction

CSV Row-by-row with headers

JSON Key-value flattening

Images OCR text extraction

Chunk

The extracted text is split into smaller passages (chunks) optimized for embedding and retrieval. The chunker uses token-aware splitting with configurable parameters:

Parameter	Default	Description
max_tokens	512	Maximum tokens per chunk
overlap_tokens	50	Token overlap between adjacent chunks

Overlap ensures that information at chunk boundaries is not lost. Each chunk retains a reference to its source document and position.

Embed

Each chunk is converted into a 1024-dimensional vector using an embedding model. Embeddings are generated in batches for efficiency.

Dimensions

1024

Batch size

25 chunks

Index

Chunks and their embedding vectors are stored in PostgreSQL with the pgvector extension. Each chunk is associated with its source document, tenant, and position metadata. The source status updates to ready.

Search

When you (or an agent) runs a query, the query text is embedded using the same model. pgvector performs a cosine similarity search across all indexed chunks for the tenant, returning the top-k most relevant passages with their similarity scores and source metadata.

-- Simplified query (actual uses pgvector operators)
SELECT chunk_text, source_id, similarity_score
FROM chunks
ORDER BY embedding <=> query_embedding
LIMIT 5;

Sync vs Async Ingestion

RAG Engine supports two ingestion paths:

Synchronous (Text)

POST /api/rag/sources/ingest

Plain text content is parsed, chunked, embedded, and indexed in a single request. Returns the source ID and chunk count immediately.

Asynchronous (File Upload)

POST /api/rag/sources/upload

Files are stored and queued for background processing (HTTP 202). Poll GET /sources/{source_id} for status updates.

Storage vs RAG Engine

MCP Gateway Pro includes both Media & Storage (raw file access) and RAG Engine (knowledge base). They solve different problems:

	Storage	RAG Engine
Purpose	Raw file access for agents	Semantic search across content
Processing	Files stored as-is	Parsed, chunked, embedded, indexed
Agent access	Read full file contents	Search for relevant passages
Best for	Reference docs, images, videos	Large doc collections, Q&A
MCP tools	media__list_media, media__get_media	knowledge_query, knowledge_ingest, ...

You can use both at the same time. Store reference files in Storage for direct access, and index your knowledge base documents in RAG Engine for semantic search.