Package Exports
- quick-rag
- quick-rag/react
Readme
Quick RAG ⚡
🚀 Production-ready RAG (Retrieval-Augmented Generation) for JavaScript & React
Built on official Ollama & LM Studio SDKs.
🎉 v2.0.3 Released! Performance improvements with batch embedding, rate limiting, and better error handling for large documents! See CHANGELOG.md for details.
✨ Features
- 🎯 Official SDKs - Built on
ollamaand@lmstudio/sdkpackages - ⚡ 5x Faster - Parallel batch embedding
- 📄 Document Loaders - PDF, Word, Excel, Text, Markdown, URLs
- 🔪 Smart Chunking - Intelligent text splitting with overlap
- 🏷️ Metadata Filtering - Filter by document properties
- 🔍 Query Explainability - See WHY documents were retrieved (unique!)
- 🎨 Dynamic Prompts - 10 built-in templates + full customization
- 🧠 Weighted Decision Making - Multi-criteria document scoring (NEW!)
- 🎯 Heuristic Reasoning - Pattern learning and query optimization (NEW!)
- CRUD Operations - Add, update, delete documents on the fly
- 🎯 Smart Retrieval - Dynamic topK parameter
- 🌊 Streaming Support - Real-time AI responses (official SDK feature)
- 🔧 Zero Config - Works with React, Next.js, Vite, Node.js
- 🎨 Multiple Providers - Ollama & LM Studio support
- 🛠️ All SDK Features - Tool calling, vision, agents, and more
- 💪 Type Safe - Full TypeScript support
- ✅ Production Ready - Thoroughly tested and documented
📦 Installation
npm install quick-ragThis package includes:
- ✅ Official
ollamaSDK (0.6.2+) - ✅ Official
@lmstudio/sdk(1.5.0+) - ✅ RAG components (vector store, retrieval, embeddings)
Prerequisites:
- Ollama installed and running, OR
- LM Studio installed with server enabled
- Models:
- Ollama:
ollama pull granite4:3bandollama pull embeddinggemma:latest - LM Studio: Any LLM model + embedding model (e.g.,
text-embedding-embeddinggemma-300m)
- Ollama:
📖 Starting a New React Project? Check out the detailed setup guide in QUICKSTART_REACT.md!
- 🎯 Official SDKs - Built on
ollamaand@lmstudio/sdkpackages - ⚡ 5x Faster - Parallel batch embedding with rate limiting
- 📄 Document Loaders - PDF, Word, Excel, Text, Markdown, URLs
- 🔪 Smart Chunking - Intelligent text splitting with overlap
- 🏷️ Metadata Filtering - Filter by document properties
- 🔍 Query Explainability - See WHY documents were retrieved (unique!)
- 🎨 Dynamic Prompts - 10 built-in templates + full customization
- 🧠 Weighted Decision Making - Multi-criteria document scoring (NEW!)
- 🎯 Heuristic Reasoning - Pattern learning and query optimization (NEW!)
- 📊 Batch Processing - Efficient handling of large document sets (v2.0.3!)
- 🚦 Rate Limiting - Prevents server overload with configurable concurrency (v2.0.3!)
- 🔄 CRUD Operations - Add, update, delete documents on the fly
- 🎯 Smart Retrieval - Dynamic topK parameter
- 🌊 Streaming Support - Real-time AI responses (official SDK feature)
- 🔧 Zero Config - Works with React, Next.js, Vite, Node.js
- 🎨 Multiple Providers - Ollama & LM Studio support
- 🛠️ All SDK Features - Tool calling, vision, agents, and more
- 💪 Type Safe - Full TypeScript support
- ✅ Production Ready - Thoroughly tested and documented
🆕 What's New
🚀 v2.0.3 - Performance & Stability (Latest!)
- ✅ Batch Embedding - Process large document sets efficiently (20+ chunks at once)
- ✅ Rate Limiting - Configurable concurrency control (prevents server overload)
- ✅ Better Error Handling - Improved network error messages and retry logic
- ✅ Progress Tracking - Enhanced progress callbacks for batch processing
🎉 v2.0.0 - Major Release
🎯 Decision Engine (NEW!)
Revolutionary AI-powered retrieval system with multi-criteria weighted scoring, heuristic reasoning, and adaptive learning. See full documentation below.
🔍 Query Explainability (NEW!)
Industry-first feature to understand WHY documents were retrieved. See full documentation below.
🎨 Dynamic Prompt Management (NEW!)
10 built-in templates + full customization for different response styles and use cases. See full documentation below.
💬 Conversation History & Export (NEW!)
Track and export conversation sessions with metadata and statistics. See example/12-conversation-history-and-export.js.
🔄 Multi-Provider Auto-Detection (NEW!)
Automatically detect and switch between Ollama and LM Studio providers. See example/04-test-both-providers.js.
✅ Function-based Filters
Advanced filtering with custom logic - filter documents using JavaScript functions:
const results = await retriever.getRelevant('latest AI news', 5, {
filter: (meta) => {
return meta.year === 2024 &&
meta.tags.includes('AI') &&
meta.difficulty !== 'beginner';
}
});📽️ PowerPoint Support
Load .pptx and .ppt files with officeparser:
import { loadDocument } from 'quick-rag';
const pptDoc = await loadDocument('./presentation.pptx');📁 Organized Examples
12 comprehensive examples covering all features:
- Basic Usage (Ollama & LM Studio)
- Document Loading (PDF, Word, Excel)
- Metadata Filtering
- Streaming Responses
- Advanced Filtering
- Query Explainability
- Prompt Management
- Decision Engine (Simple & Real-World)
- Conversation History & Export
🆕 Previous Features (v1.1.x)
📝 Internationalization Update
- Translated all example files to English for better international accessibility
example/10-decision-engine-simple.js- Smart Document Selection exampleexample/11-decision-engine-pdf-real-world.js- Real-world PDF scenario example
🧠 Decision Engine (v1.1.0)
Revolutionary AI-powered retrieval system - The most advanced RAG retrieval available!
Quick RAG now includes a Decision Engine that goes far beyond simple cosine similarity. It combines:
- 🎯 Multi-Criteria Weighted Scoring - 5 factors evaluated together
- 🧠 Heuristic Reasoning - Pattern-based query optimization
- � Adaptive Learning - Learns from user feedback
- �🔍 Full Transparency - See exactly why each document was selected
Multi-Criteria Scoring
5 weighted factors beyond similarity:
- 📊 Semantic Similarity (50%) - Cosine similarity score
- 🔤 Keyword Match (20%) - Term matching in document
- 📅 Recency (15%) - Document freshness with exponential decay
- ⭐ Source Quality (10%) - Source reliability (official=1.0, research=0.9, blog=0.7, forum=0.6)
- 🎯 Context Relevance (5%) - Contextual fit
import { SmartRetriever, DEFAULT_WEIGHTS } from 'quick-rag';
// Create smart retriever with default weights
const smartRetriever = new SmartRetriever(basicRetriever);
// Or customize weights for your use case
const smartRetriever = new SmartRetriever(basicRetriever, {
weights: {
semanticSimilarity: 0.35,
keywordMatch: 0.20,
recency: 0.30, // Higher for news sites
sourceQuality: 0.10,
contextRelevance: 0.05
}
});
// Get results with decision transparency
const response = await smartRetriever.getRelevant('latest AI news', 3);
// See scoring breakdown for each document
console.log(response.results[0]);
// {
// text: "...",
// weightedScore: 0.742,
// scoreBreakdown: {
// semanticSimilarity: { score: 0.85, weight: 0.35, contribution: 0.298 },
// keywordMatch: { score: 0.67, weight: 0.20, contribution: 0.134 },
// recency: { score: 0.95, weight: 0.30, contribution: 0.285 },
// sourceQuality: { score: 0.90, weight: 0.10, contribution: 0.090 },
// contextRelevance: { score: 1.00, weight: 0.05, contribution: 0.050 }
// }
// }
// Decision context shows WHY these results
console.log(response.decisions);
// {
// weights: { ... },
// appliedRules: ["boost-recent-for-news"],
// suggestions: [
// "Time-sensitive query detected. Prioritizing recent documents.",
// "Consider using filters if you need older historical content."
// ]
// }Heuristic Reasoning
Pattern-based optimization that learns:
// Enable learning mode
const smartRetriever = new SmartRetriever(basicRetriever, {
enableLearning: true,
enableHeuristics: true
});
// Add custom rules
smartRetriever.heuristicEngine.addRule(
'boost-documentation',
(query, context) => query.includes('documentation'),
(query, context) => {
context.adjustWeight('sourceQuality', 0.15); // Increase quality weight
return { adjusted: true, reason: 'Documentation query prioritizes quality' };
},
5 // Priority
);
// Provide feedback to enable learning
smartRetriever.provideFeedback(query, results, {
rating: 5, // 1-5 rating
hasFilters: true, // User applied filters
comment: 'Perfect results!'
});
// System learns successful patterns
const insights = smartRetriever.getInsights();
console.log(insights.heuristics.successfulPatterns);
// ["latest", "documentation", "official release"]
// Export learned knowledge
const knowledge = smartRetriever.exportKnowledge();
// Import to another instance
newRetriever.importKnowledge(knowledge);Scenario Customization
Different weights for different use cases:
// News Platform - Recency Priority
const newsRetriever = new SmartRetriever(basicRetriever, {
weights: {
semanticSimilarity: 0.30,
keywordMatch: 0.20,
recency: 0.40, // 🔥 High recency
sourceQuality: 0.05,
contextRelevance: 0.05
}
});
// Documentation Site - Quality Priority
const docsRetriever = new SmartRetriever(basicRetriever, {
weights: {
semanticSimilarity: 0.35,
keywordMatch: 0.20,
recency: 0.10,
sourceQuality: 0.30, // 🔥 High quality
contextRelevance: 0.05
}
});
// Research Platform - Balanced
const researchRetriever = new SmartRetriever(basicRetriever, {
weights: DEFAULT_WEIGHTS // Balanced approach
});Real-World Example
See example/11-decision-engine-pdf-real-world.js for a complete example with:
- PDF document loading
- Multiple source types (official, blog, research, forum)
- 3 different scenarios (news, documentation, research)
- RAG generation with quality metrics
- Decision transparency and explanations
Benefits:
- ✅ More accurate retrieval than pure similarity
- ✅ Adapts to different content types automatically
- ✅ Learns from user interactions
- ✅ Fully explainable decisions
- ✅ Customizable for any use case
- ✅ Production-ready with proven patterns
🔍 Query Explainability (v1.1.0)
Understand WHY documents were retrieved - A first-of-its-kind feature!
const results = await retriever.getRelevant('What is Ollama?', 3, {
explain: true
});
// Each result includes detailed explanation:
console.log(results[0].explanation);
// {
// queryTerms: ["ollama", "local", "ai"],
// matchedTerms: ["ollama", "local"],
// matchCount: 2,
// matchRatio: 0.67,
// cosineSimilarity: 0.856,
// relevanceFactors: {
// termMatches: 2,
// semanticSimilarity: 0.856,
// coverage: "67%"
// }
// }Use cases: Debug searches, optimize queries, validate accuracy, explain to users
🎨 Dynamic Prompt Management (v1.1.0)
10 built-in templates + full customization
// Quick template selection
await generateWithRAG(client, model, query, docs, {
template: 'conversational' // or: technical, academic, code, etc.
});
// System prompts for role definition
await generateWithRAG(client, model, query, docs, {
systemPrompt: 'You are a helpful programming tutor',
template: 'instructional'
});
// Advanced: Reusable PromptManager
import { createPromptManager } from 'quick-rag';
const promptMgr = createPromptManager({
systemPrompt: 'You are an expert engineer',
template: 'technical'
});
await generateWithRAG(client, model, query, docs, {
promptManager: promptMgr
});Templates: default, conversational, technical, academic, code, concise, detailed, qa, instructional, creative
🚀 Quick Start
Option 1: With Official Ollama SDK (Recommended)
import {
OllamaRAGClient,
createOllamaRAGEmbedding,
InMemoryVectorStore,
Retriever
} from 'quick-rag';
// 1. Initialize client (official SDK)
const client = new OllamaRAGClient({
host: 'http://127.0.0.1:11434'
});
// 2. Setup embedding
const embed = createOllamaRAGEmbedding(client, 'embeddinggemma');
// 3. Create vector store
const vectorStore = new InMemoryVectorStore(embed);
const retriever = new Retriever(vectorStore);
// 4. Add documents
await vectorStore.addDocument({
text: 'Ollama provides local LLM hosting.'
});
// 5. Query with streaming (official SDK feature!)
const results = await retriever.getRelevant('What is Ollama?', 2);
const context = results.map(d => d.text).join('\n');
const response = await client.chat({
model: 'granite4:tiny-h',
messages: [{
role: 'user',
content: `Context: ${context}\n\nQuestion: What is Ollama?`
}],
stream: true, // Official SDK streaming!
});
// Stream response
for await (const part of response) {
process.stdout.write(part.message?.content || '');
}Option 2: React with Vite
💡 Starting from scratch? Check out the detailed step-by-step guide in QUICKSTART_REACT.md!
Step 1: Create your project
npm create vite@latest my-rag-app -- --template react
cd my-rag-app
npm install quick-rag express concurrentlyStep 2: Create backend proxy (server.js in project root)
import express from 'express';
import { OllamaRAGClient } from 'quick-rag';
const app = express();
app.use(express.json());
const client = new OllamaRAGClient({ host: 'http://127.0.0.1:11434' });
app.post('/api/generate', async (req, res) => {
const { model = 'granite4:tiny-h', messages } = req.body;
const response = await client.chat({ model, messages, stream: false });
res.json({ response: response.message.content });
});
app.post('/api/embed', async (req, res) => {
const { model = 'embeddinggemma', input } = req.body;
const response = await client.embed(model, input);
res.json(response);
});
app.listen(3001, () => console.log('🚀 Server: http://127.0.0.1:3001'));Step 3: Configure Vite proxy (vite.config.js)
import { defineConfig } from 'vite';
import react from '@vitejs/plugin-react';
export default defineConfig({
plugins: [react()],
server: {
proxy: {
'/api': {
target: 'http://127.0.0.1:3001',
changeOrigin: true
}
}
}
});Step 4: Update package.json scripts
{
"scripts": {
"dev": "concurrently \"npm:server\" \"npm:client\"",
"server": "node server.js",
"client": "vite"
}
}Step 5: Use in your React component (src/App.jsx)
import { useState, useEffect } from 'react';
import { useRAG, initRAG, createBrowserModelClient } from 'quick-rag';
const docs = [
{ id: '1', text: 'React is a JavaScript library for building user interfaces.' },
{ id: '2', text: 'Ollama provides local LLM hosting.' },
{ id: '3', text: 'RAG combines retrieval with AI generation.' }
];
export default function App() {
const [rag, setRAG] = useState(null);
const [query, setQuery] = useState('');
const { run, loading, response, docs: results } = useRAG({
retriever: rag?.retriever,
modelClient: createBrowserModelClient(),
model: 'granite4:tiny-h'
});
useEffect(() => {
initRAG(docs, {
baseEmbeddingOptions: {
useBrowser: true,
baseUrl: '/api/embed',
model: 'embeddinggemma'
}
}).then(core => setRAG(core));
}, []);
return (
<div style={{ padding: 40 }}>
<h1>🤖 RAG Demo</h1>
<input
value={query}
onChange={e => setQuery(e.target.value)}
placeholder="Ask something..."
style={{ width: 300, padding: 10 }}
/>
<button onClick={() => run(query)} disabled={loading}>
{loading ? 'Thinking...' : 'Ask AI'}
</button>
{results && (
<div>
<h3>📚 Retrieved:</h3>
{results.map(d => <p key={d.id}>{d.text}</p>)}
</div>
)}
{response && (
<div>
<h3>✨ Answer:</h3>
<p>{response}</p>
</div>
)}
</div>
);
}Step 6: Run your app
npm run devOpen http://localhost:5173 🎉
Option 2: Next.js (Pages Router)
Step 1: Create API routes
// pages/api/generate.js
import { OllamaClient } from 'quick-rag';
export default async function handler(req, res) {
const client = new OllamaClient();
const { model = 'granite4:tiny-h', prompt } = req.body;
const response = await client.generate(model, prompt);
res.json({ response });
}// pages/api/embed.js
import { OllamaClient } from 'quick-rag';
export default async function handler(req, res) {
const client = new OllamaClient();
const { model = 'embeddinggemma', input } = req.body;
const response = await client.embed(model, input);
res.json(response);
}Step 2: Use in your page (same React component as above)
Option 3: Vanilla JavaScript (Node.js)
Simple approach with official Ollama SDK:
import {
OllamaRAGClient,
createOllamaRAGEmbedding,
InMemoryVectorStore,
Retriever
} from 'quick-rag';
// 1. Initialize client
const client = new OllamaRAGClient();
// 2. Setup embedding
const embed = createOllamaRAGEmbedding(client, 'embeddinggemma');
// 3. Create vector store and retriever
const vectorStore = new InMemoryVectorStore(embed);
const retriever = new Retriever(vectorStore);
// 4. Add documents
await vectorStore.addDocuments([
{ text: 'JavaScript is a programming language.' },
{ text: 'Python is great for data science.' },
{ text: 'Rust is a systems programming language.' }
]);
// 5. Query
const query = 'What is JavaScript?';
const results = await retriever.getRelevant(query, 2);
// 6. Generate answer
const context = results.map(d => d.text).join('\n');
const response = await client.chat({
model: 'granite4:tiny-h',
messages: [{
role: 'user',
content: `Context:\n${context}\n\nQuestion: ${query}\n\nAnswer:`
}]
});
// Clean output
console.log('📚 Retrieved:', results.map(d => d.text));
console.log('🤖 Answer:', response.message.content);Output:
📚 Retrieved: [
'JavaScript is a programming language.',
'Python is great for data science.'
]
🤖 Answer: JavaScript is a programming language that allows developers
to write code and implement functionality in web browsers...Option 4: LM Studio 🎨
Use LM Studio instead of Ollama with OpenAI-compatible API:
import {
LMStudioRAGClient,
createLMStudioRAGEmbedding,
InMemoryVectorStore,
Retriever,
generateWithRAG
} from 'quick-rag';
// 1. Initialize LM Studio client
const client = new LMStudioRAGClient();
// 2. Setup embedding (use your embedding model from LM Studio)
const embed = createLMStudioRAGEmbedding(client, 'nomic-embed-text-v1.5');
// 3. Create vector store and retriever
const vectorStore = new InMemoryVectorStore(embed);
const retriever = new Retriever(vectorStore);
// 4. Add documents
await vectorStore.addDocuments([
{ text: 'LM Studio is a desktop app for running LLMs locally.' },
{ text: 'It provides an OpenAI-compatible API.' },
{ text: 'You can use models like Llama, Mistral, and more.' }
]);
// 5. Query with RAG
const results = await retriever.getRelevant('What is LM Studio?', 2);
const answer = await generateWithRAG(
client,
'qwen/qwen3-4b-2507', // or your model name
'What is LM Studio?',
results
);
console.log('Answer:', answer);Prerequisites for LM Studio:
- Download and install LM Studio
- Download a language model (e.g., Llama 3.2, Mistral)
- Download an embedding model (e.g., nomic-embed-text)
- Start the local server:
Developer > Local Server(default:http://localhost:1234)
For React projects: Import from 'quick-rag/react' to use hooks:
import { useRAG } from 'quick-rag/react';
// or
import { useRAG } from 'quick-rag'; // Also works in React projects📖 API Reference
React Hook: useRAG
const { run, loading, response, docs, streaming, error } = useRAG({
retriever, // Retriever instance
modelClient, // Model client (OllamaClient or BrowserModelClient)
model // Model name (e.g., 'granite4:tiny-h')
});
// Ask a question
await run('What is React?');
// With options
await run('What is React?', {
topK: 5, // Number of documents to retrieve
stream: true, // Enable streaming
onDelta: (chunk, fullText) => console.log(chunk)
});Core Functions
Initialize RAG
const { retriever, store, mrl } = await initRAG(documents, {
defaultDim: 128, // Embedding dimension
k: 2, // Default number of results
mrlBaseDim: 768, // Base embedding dimension
baseEmbeddingOptions: {
useBrowser: true, // Use browser-safe fetch
baseUrl: '/api/embed', // Embedding endpoint
model: 'embeddinggemma' // Embedding model
}
});Generate with RAG
const result = await generateWithRAG({
retriever,
modelClient,
model,
query: 'Your question',
topK: 3 // Optional: override default k
});
// Returns: { docs, response, prompt }VectorStore API
const store = new InMemoryVectorStore(embeddingFn, { defaultDim: 128 });
// Add documents
await store.addDocument({ id: '1', text: 'Document text' });
// Add multiple documents with batch processing (v2.0.3!)
await store.addDocuments([{ id: '1', text: '...' }], {
dim: 128,
batchSize: 20, // Process 20 chunks at a time
maxConcurrent: 5, // Max 5 concurrent requests
onProgress: (current, total) => {
console.log(`Progress: ${current}/${total}`);
}
});
// Query
const results = await store.similaritySearch('query', k, queryDim);
// CRUD
const doc = store.getDocument('id');
const all = store.getAllDocuments();
await store.updateDocument('id', 'new text', { meta: 'data' });
store.deleteDocument('id');
store.clear();Batch Processing for Large Documents (v2.0.3):
// Process large PDFs efficiently
const chunks = chunkDocuments([largePDF], { chunkSize: 1000, overlap: 100 });
await store.addDocuments(chunks, {
batchSize: 20, // Process 20 chunks per batch
maxConcurrent: 5, // Max 5 concurrent embedding requests
onProgress: (current, total) => {
console.log(`Embedding progress: ${current}/${total} (${Math.round(current/total*100)}%)`);
}
});Model Clients
Browser (with proxy)
const client = createBrowserModelClient({
endpoint: '/api/generate' // Your proxy endpoint
});Node.js (direct)
const client = new OllamaClient({
baseUrl: 'http://127.0.0.1:11434/api'
});💡 Examples
CRUD Operations
// Add document dynamically
await store.addDocument({
id: 'new-doc',
text: 'TypeScript adds types to JavaScript.'
});
// Add multiple documents with batch processing (v2.0.3!)
await store.addDocuments([
{ id: 'doc1', text: 'First document' },
{ id: 'doc2', text: 'Second document' }
], {
batchSize: 10, // Process in batches
maxConcurrent: 5, // Rate limiting
onProgress: (current, total) => {
console.log(`Added ${current}/${total} documents`);
}
});
// Update existing
await store.updateDocument('1', 'React 19 is the latest version.', {
version: '19',
updated: Date.now()
});
// Delete
store.deleteDocument('2');
// Query all
const allDocs = store.getAllDocuments();
console.log(`Total documents: ${allDocs.length}`);Dynamic Retrieval
// Ask with different topK values
const result1 = await run('What is JavaScript?', { topK: 1 }); // Get 1 doc
const result2 = await run('What is JavaScript?', { topK: 5 }); // Get 5 docsStreaming Responses
await run('Explain React hooks', {
stream: true,
onDelta: (chunk, fullText) => {
console.log('New chunk:', chunk);
// Update UI in real-time
}
});Custom Embedding Models
// Use different embedding models
const rag = await initRAG(docs, {
baseEmbeddingOptions: {
useBrowser: true,
baseUrl: '/api/embed',
model: 'nomic-embed-text' // or 'mxbai-embed-large', etc.
}
});More examples: Check the example/ folder for complete demos.
📄 Document Loaders (v0.7.4+)
Load documents from various formats and use them with RAG!
Supported Formats
| Format | Function | Requires |
|---|---|---|
loadPDF() |
npm install pdf-parse |
|
| Word (.docx) | loadWord() |
npm install mammoth |
| Excel (.xlsx) | loadExcel() |
npm install xlsx |
| Text (.txt) | loadText() |
Built-in ✅ |
| JSON | loadJSON() |
Built-in ✅ |
| Markdown | loadMarkdown() |
Built-in ✅ |
| Web URLs | loadURL() |
Built-in ✅ |
Quick Start
Load PDF:
import { loadPDF, chunkDocuments } from 'quick-rag';
// Load PDF
const pdf = await loadPDF('./document.pdf');
console.log(`Loaded ${pdf.meta.pages} pages`);
// Chunk and add to RAG
const chunks = chunkDocuments([pdf], {
chunkSize: 500,
overlap: 50
});
await store.addDocuments(chunks);Load from URL:
import { loadURL } from 'quick-rag';
const doc = await loadURL('https://example.com', {
extractText: true // Convert HTML to plain text
});
await store.addDocuments([doc]);Load Directory:
import { loadDirectory } from 'quick-rag';
// Load all supported documents from a folder
const docs = await loadDirectory('./documents', {
extensions: ['.pdf', '.docx', '.txt', '.md'],
recursive: true
});
console.log(`Loaded ${docs.length} documents`);
// Chunk and add to vector store
const chunks = chunkDocuments(docs, { chunkSize: 500 });
await store.addDocuments(chunks);Auto-Detect Format:
import { loadDocument } from 'quick-rag';
// Automatically detects file type
const doc = await loadDocument('./file.pdf');
// Works with: .pdf, .docx, .xlsx, .txt, .md, .jsonInstallation
# Core package (includes text, JSON, markdown, URL loaders)
npm install quick-rag
# Optional: PDF support
npm install pdf-parse
# Optional: Word support
npm install mammoth
# Optional: Excel support
npm install xlsx
# Or install all at once:
npm install quick-rag pdf-parse mammoth xlsxComplete Example
import {
loadPDF,
loadDirectory,
chunkDocuments,
InMemoryVectorStore,
Retriever,
OllamaRAGClient,
createOllamaRAGEmbedding,
generateWithRAG
} from 'quick-rag';
// Load documents
const pdf = await loadPDF('./research.pdf');
const docs = await loadDirectory('./articles');
// Combine and chunk
const allDocs = [pdf, ...docs];
const chunks = chunkDocuments(allDocs, {
chunkSize: 500,
overlap: 50
});
// Setup RAG
const client = new OllamaRAGClient();
const embed = createOllamaRAGEmbedding(client, 'embeddinggemma');
const store = new InMemoryVectorStore(embed);
const retriever = new Retriever(store);
// Add to vector store
await store.addDocuments(chunks);
// Query
const results = await retriever.getRelevant('What is the main topic?', 3);
const answer = await generateWithRAG(client, 'granite4:tiny-h',
'What is the main topic?', results);
console.log(answer);See full example: example/advanced/document-loading-example.js
❓ Troubleshooting
| Problem | Solution |
|---|---|
| 🚫 CORS errors | Use a proxy server (Express/Next.js API routes) |
| 🔌 Connection refused | Ensure Ollama is running: ollama serve |
| 📦 Models not found | Pull models: ollama pull granite4:tiny-h && ollama pull embeddinggemma |
🌐 404 on /api/embed |
Check your proxy configuration in vite.config.js or API routes |
| 💻 Windows IPv6 issues | Use 127.0.0.1 instead of localhost |
| 📦 Module not found | Check imports: use 'quick-rag' not 'quick-rag/...' |
Note: v0.6.5+ automatically detects and uses the correct API (generate or chat) for any model.
📚 Learn More
- Examples:
/examplefolder with working demos - Changelog:
CHANGELOG.md- version history - Ollama Models: ollama.ai/library
- Issues: GitHub Issues
📄 License
MIT © Emre Developer
Made with ❤️ for the JavaScript & AI community