What is RAG?

Retrieval-Augmented Generation (RAG) is a technique that lets AI models answer questions using your own documents and data, instead of relying solely on their training knowledge.

The Problem

Large language models are trained on public data up to a cutoff date. They can't answer questions about:

Your internal documents, policies, and procedures
Recent information published after their training cutoff
Private data like customer records, contracts, or research
Domain-specific knowledge unique to your organization

You could paste documents into every prompt, but that's limited by context window size, slow, and expensive. RAG solves this by retrieving only the relevant parts of your documents at query time.

How RAG Works

RAG has two phases: building a knowledge base (ingestion) and answering questions (retrieval).

1 Ingestion — Building the Knowledge Base

Parse

Documents are parsed into plain text. PDFs, Word docs, HTML, images (via OCR) — each format has a dedicated parser.

Chunk

Text is split into small, overlapping chunks (typically 512 tokens). This ensures relevant passages can be retrieved precisely.

Embed

Each chunk is converted to a vector (a list of numbers) using an embedding model. Semantically similar text produces similar vectors.

2 Retrieval — Answering Questions

Embed Query

The user's question is converted to a vector using the same embedding model.

Search

The query vector is compared against all stored chunk vectors. The most similar chunks are returned, ranked by relevance.

Generate

Retrieved chunks are passed to an AI model as context, which generates an answer grounded in your actual data.

Key Concepts

Embeddings

An embedding is a vector (array of numbers) that captures the meaning of text. Texts with similar meanings produce vectors that are close together in vector space.

"How do I reset my password?"  →  [0.12, -0.45, 0.78, ...]
"Password recovery steps"      →  [0.11, -0.43, 0.76, ...]  ← similar!
"What's for lunch today?"      →  [0.89, 0.23, -0.15, ...]  ← different

Vector Database

A specialized database optimized for storing and searching vectors. Instead of exact keyword matching, it finds the most semantically similar results. AppXen uses pgvector — the PostgreSQL vector extension — for fast approximate nearest-neighbor search.

Chunking

Documents are split into small pieces (chunks) because embedding models work best on shorter text, and smaller chunks mean more precise retrieval. Chunks overlap slightly so context isn't lost at boundaries. AppXen defaults to 512 tokens per chunk with 50 tokens of overlap.

RAG Engine vs Storage

AppXen offers two ways to make files available to AI agents. Choose based on how agents need to use the content.

RAG Engine

Documents are parsed, chunked, embedded, and indexed. Agents search across content semantically.

- Best for large document collections
- Agents find relevant passages automatically
- Works across hundreds of documents at once
- Returns ranked, scored results

Storage

Files are stored as-is. Agents can list and read files directly via MCP tools.

- Best for reference files agents read verbatim
- Images, videos, and binary files
- When you need the exact original content
- Simpler — no processing pipeline

Where AppXen RAG Engine Fits

AppXen RAG Engine is a managed RAG pipeline that integrates directly with MCP Gateway. It handles the entire process:

- Multi-format ingestion — Upload PDFs, Word docs, HTML, Markdown, CSV, JSON, plain text, and images (with OCR)
- Automatic pipeline — Parse, chunk, embed, and index (pgvector) with no configuration
- MCP integration — Exposed as a built-in MCP server so any connected AI agent can search your knowledge base
- Dashboard — Upload documents, manage sources, run test queries, and monitor stats from the console