JSPM

  • Created
  • Published
  • Downloads 1937
  • Score
    100M100P100Q109816F
  • License MIT

Official TypeScript/JavaScript SDK for CortexDB - Multi-modal RAG Platform with advanced document processing

Package Exports

  • @dooor-ai/cortexdb
  • @dooor-ai/cortexdb/schema-decorators

Readme

CortexDB TypeScript SDK

██████╗  ██████╗  ██████╗  ██████╗ ██████╗ 
██╔══██╗██╔═══██╗██╔═══██╗██╔═══██╗██╔══██╗
██║  ██║██║   ██║██║   ██║██║   ██║██████╔╝
██║  ██║██║   ██║██║   ██║██║   ██║██╔══██╗
██████╔╝╚██████╔╝╚██████╔╝╚██████╔╝██║  ██║
╚═════╝  ╚═════╝  ╚═════╝  ╚═════╝ ╚═╝  ╚═╝

Official TypeScript/JavaScript SDK for CortexDB

npm version License: MIT


What is CortexDB?

CortexDB is a multi-modal RAG (Retrieval Augmented Generation) platform that combines traditional database capabilities with vector search and advanced document processing. It enables you to:

  • Store structured and unstructured data in a unified database
  • Automatically extract text from documents (PDF, DOCX, XLSX) using Docling
  • Generate embeddings for semantic search using various providers (OpenAI, Gemini, etc.)
  • Perform hybrid search combining filters with vector similarity
  • Build RAG applications with automatic chunking and vectorization

CortexDB handles the complex infrastructure of vector databases (Qdrant), object storage (MinIO), and traditional databases (PostgreSQL) behind a simple API.

Features

  • Multi-modal document processing: Upload PDFs, DOCX, XLSX files and automatically extract text with OCR fallback
  • Semantic search: Vector-based search using embeddings from OpenAI, Gemini, or custom providers
  • Automatic chunking: Smart text splitting optimized for RAG applications
  • Flexible schema: Define collections with typed fields (string, number, boolean, file, array)
  • Hybrid queries: Combine exact filters with semantic search
  • Storage control: Choose where each field is stored (PostgreSQL, Qdrant, MinIO)
  • Type-safe: Full TypeScript support with comprehensive type definitions
  • Modern API: Async/await using native fetch (Node.js 18+)
  • Infra management: Database (client.databases) and embedding provider (client.embeddingProviders) APIs built-in
  • 🆕 TypeScript Decorators: Define schemas using decorators (like TypeORM) with full IDE support - see Schema Decorators Guide

Installation

npm install @dooor-ai/cortexdb

Or with yarn:

yarn add @dooor-ai/cortexdb

Or with pnpm:

pnpm add @dooor-ai/cortexdb

Quick Start

import { CortexClient, FieldType, StoreLocation } from '@dooor-ai/cortexdb';

async function main() {
  // Initialize with database in connection string
  const client = new CortexClient('cortexdb://my-api-key@localhost:8000/production');

  // Create a collection with vectorization enabled
  await client.collections.create(
    'documents',
    [
      { name: 'title', type: FieldType.STRING },
      { name: 'content', type: FieldType.TEXT, vectorize: true },
      { name: 'published_at', type: FieldType.DATETIME, store_in: [StoreLocation.POSTGRES] }
    ],
    'your-embedding-provider-id'  // Required when vectorize=true
    // database parameter is optional here since we set 'production' as default
  );

  // Create a record
  const record = await client.records.create('documents', {
    title: 'Introduction to AI',
    content: 'Artificial intelligence is transforming how we build software...'
  });

  // Semantic search - finds relevant content by meaning, not just keywords
  const results = await client.records.search(
    'documents',
    'How is AI changing software development?',
    undefined,  // filters
    10  // limit - database parameter optional since we have default
  );

  results.results.forEach(result => {
    console.log(`Score: ${result.score.toFixed(4)}`);
    console.log(`Title: ${result.record.data.title}`);
    console.log(`Content: ${result.record.data.content}\n`);
  });

  await client.close();
}

main();

Project-Specific Typing

The SDK becomes fully type-safe once you apply your YAML schema with the Dooor CLI:

npx dooor schema apply          # reads dooor/schemas by default and generates types in dooor/generated/

This command creates dooor/generated/cortex-schema.ts and automatically augments the SDK types. After the file exists in your project, you can keep importing CortexClient from @dooor-ai/cortexdb; TypeScript will infer the fields/collections defined in your YAML. Invalid field names or missing required properties inside client.records.create('my_collection', {...}) now trigger compile-time errors, Prisma-style.

If you need an explicit factory, the generated file also exports createCortexClient() and TypedCortexClient helpers.

ℹ️ The CLI also drops a lightweight .d.ts shim in node_modules/@dooor-ai/cortexdb/generated/schema.d.ts, so TypeScript picks up your schema automatically—no need to tweak tsconfig.json.

Prisma-like Records Delegates

Once the schema is generated, you can call collections with property access instead of passing strings:

// Fully typed
const record = await client.records.tool_calls.create({
  chatId: "chat-123",
  description: "RAG invocation summary",
  createdAt: new Date().toISOString(),
});

// String form still available when you need something dynamic
await client.records.create("tool_calls", {
  chatId,
  description,
  createdAt,
});

Usage

Initialize Client

import { CortexClient } from '@dooor-ai/cortexdb';

// Using connection string with database (recommended)
const client = new CortexClient('cortexdb://my-api-key@localhost:8000/production');

// Without database in connection string (must pass database to each method)
const client = new CortexClient('cortexdb://my-api-key@localhost:8000');

// Production (HTTPS auto-detected)
const client = new CortexClient('cortexdb://my-key@api.cortexdb.com/production');

// Using options object (alternative)
const client = new CortexClient({
  baseUrl: 'http://localhost:8000',
  apiKey: 'your-api-key',
  database: 'production',  // Optional: set default database
  timeout: 1800000,        // Optional: override timeout (default = 30 min to cover large uploads)
  waitUntilComplete: true, // Optional: keep SDK waiting for async ingestion to finish (default = true)
});

Connection String Format: cortexdb://[api_key@]host[:port][/database]

Benefits:

  • Single string configuration
  • Easy to store in environment variables
  • Familiar pattern (like PostgreSQL, MongoDB, Redis)
  • Auto-detects HTTP vs HTTPS
  • Optional database specification for multi-tenant isolation

Database Parameter:

  • If you specify a database in the connection string or options, it becomes the default for all operations
  • You can override the default database on a per-method basis
  • If no default database is set, you must pass the database parameter to each method

Async File Uploads & Processing

Large documents (PDFs, DOCXs, etc.) are ingested asynchronously to avoid timeouts. When you call client.records.create(...) the gateway now responds immediately with a payload like:

{
  "id": "rec_123",
  "status": "pending",
  "processing_state": {
    "record_id": "rec_123",
    "status": "pending",
    "processed_chunks": 0,
    "total_chunks": 0
  }
}

By default the SDK keeps polling the processing_state endpoint until the background worker finishes and only then resolves with the final CreateRecordResponse. That preserves backward compatibility with existing backends that expect a fully processed record once create() returns.

You can control this behavior:

// Return immediately (HTTP 202) and poll manually later
const pending = await client.records.create(
  'documents',
  { title: 'Async', content: '...' },
  undefined,
  { waitUntilComplete: false }
);

// Later in your workflow…
const status = await client.records.getStatus('documents', pending.id);
if (status?.status === 'completed') {
  const finalRecord = await client.records.waitForCompletion('documents', pending.id);
}

Useful options:

  • waitUntilComplete (default true): let the SDK poll automatically.
  • pollingIntervalMs (default 5000): change how often the SDK checks status.
  • timeoutMs (default 30 min): upper bound for the auto-poll loop.

Under the hood the SDK calls GET /records/{id}/status until the worker updates the processing_state to completed or failed. You can also call that endpoint directly via client.records.getStatus(...) to drive custom progress indicators.

Databases

// Create database
await client.databases.create({ name: 'ai_docs', description: 'Knowledge base' });

// List databases
const databases = await client.databases.list();

// Delete database
await client.databases.delete('ai_docs');

Embedding Providers

await client.embeddingProviders.create({
  name: 'Gemini Flash',
  provider: 'gemini',
  embedding_model: 'models/text-embedding-004',
  api_key: process.env.GEMINI_API_KEY!,
});

const providers = await client.embeddingProviders.list();

Collections

Collections define the schema for your data. Each collection can have multiple fields with different types and storage options.

import { FieldType, StoreLocation } from '@dooor-ai/cortexdb';

// Create collection with vectorization (database required)
const collection = await client.collections.create(
  'articles',
  [
    {
      name: 'title',
      type: FieldType.STRING
    },
    {
      name: 'content',
      type: FieldType.TEXT,
      vectorize: true  // Enable semantic search on this field
    },
    {
      name: 'year',
      type: FieldType.INT,
      store_in: [StoreLocation.POSTGRES, StoreLocation.QDRANT_PAYLOAD]
    }
  ],
  'embedding-provider-id',  // Required when any field has vectorize=true
  'production'  // Database name (or omit if default database is set)
);

// List collections (uses default database if set, or pass specific database)
const collections = await client.collections.list('production');

// Get collection schema
const schema = await client.collections.get('articles', 'production');

// Delete collection and all its records
await client.collections.delete('articles', 'production');

// If you set a default database in the client, you can omit it:
const client = new CortexClient('cortexdb://key@host:8000/production');
const collections = await client.collections.list();  // Uses 'production'

Records

Records are the actual data stored in collections. They must match the collection schema.

import fs from 'node:fs';

// Create record (with optional file upload and database)
const created = await client.records.create(
  'articles',
  {
    title: 'Machine Learning Basics',
    content: 'Machine learning is a subset of AI focused on learning from data...',
    year: 2024,
  },
  {
    attachment: fs.readFileSync('ml-intro.pdf'),
  },
  'production'  // Database name
);

// Get record by ID
const fetched = await client.records.get('articles', created.id, 'production');

// Update record
const updated = await client.records.update('articles', created.id, {
  year: 2025,
}, 'production');

// Delete record
await client.records.delete('articles', created.id, 'production');

// List records with filters/pagination
const results = await client.records.list('articles', {
  limit: 10,
  offset: 0,
  filters: { year: { $gte: 2023 } },
});

Schema CLI (YAML)

Install the CLI (recommended in devDependencies):

npm install --save-dev dooor

Use the unified dooor CLI to synchronize declarative schemas. Also install the "Dooor Tools" extension in VS Code/Cursor for real-time validation (Open VSX).

# Check differences between local YAML and CortexDB
npx dooor schema diff --dir dooor/schemas

# Create collections that don't exist yet
npx dooor schema apply --dir dooor/schemas

# Apply without generating types (by default apply already generates them)
npx dooor schema apply --no-generate-types

# Generate TypeScript types for use in services
npx dooor schema generate-types --dir dooor/schemas --out src/generated/cortex-schema.ts

Automatic Collection Typing

After synchronizing the schema, the CLI generates dooor/generated/cortex-schema.ts with derived types. Provide this schema to the SDK to get Prisma-like autocomplete and validation:

import { CortexClient } from '@dooor-ai/cortexdb';
import type {
  CortexGeneratedSchema,
  CollectionCreateInput,
} from '../dooor/generated/cortex-schema';

const client = new CortexClient<CortexGeneratedSchema>(
  process.env.CORTEXDB_CONNECTION!,
);

const payload: CollectionCreateInput<'tool_calls'> = {
  chatId,
  workspaceId,
  toolName,
  description,
  toolOutput,
  createdAt: new Date().toISOString(),
};

await client.records.create('tool_calls', payload);

Generics propagate to records.update, records.list, records.get, and records.search. If you prefer the old dynamic mode, instantiate new CortexClient() without the generic parameter.

Set CORTEXDB_CONNECTION (e.g., cortexdb://key@host:8000) or the CORTEXDB_BASE_URL + CORTEXDB_API_KEY variables before running commands. If no directory is specified, the CLI automatically looks in dooor/schemas.

To avoid repeating flags, configure dooor/config.yaml at the project root:

cortexdb:
  connection: env(CORTEXDB_CONNECTION)
  defaultEmbeddingProvider: default-provider

schema:
  dir: dooor/schemas
  typesOut: dooor/generated/cortex-schema.ts

You can override with dooor/config.local.yaml or point to another path via DOOOR_CONFIG.

Semantic search finds records by meaning, not just exact keyword matches. It uses vector embeddings to understand context.

// Basic semantic search
const results = await client.records.search(
  'articles',
  'machine learning fundamentals',
  undefined,
  10
);

// Search with filters - combine semantic search with exact matches
const filteredResults = await client.records.search(
  'articles',
  'neural networks',
  {
    year: 2024,
    category: 'AI'
  },
  5
);

// Process results - ordered by relevance score
filteredResults.results.forEach(result => {
  console.log(`Score: ${result.score.toFixed(4)}`);  // Higher = more relevant
  console.log(`Title: ${result.record.data.title}`);
  console.log(`Year: ${result.record.data.year}`);
});

Working with Files

CortexDB can process documents and automatically extract text for vectorization.

// Create collection with file field
await client.collections.create(
  'documents',
  [
    { name: 'title', type: FieldType.STRING },
    {
      name: 'document',
      type: FieldType.FILE,
      vectorize: true  // Extract text and create embeddings
    }
  ],
  'embedding-provider-id'
);

// Note: File upload support is currently available in the REST API
// TypeScript SDK file upload will be added in a future version

Filter Operators

// Exact match filters
const results = await client.records.list('articles', {
  filters: {
    category: 'technology',
    published: true,
    year: 2024
  }
});

// Combine multiple filters
const filtered = await client.records.list('articles', {
  filters: {
    year: 2024,
    category: 'AI',
    author: 'John Doe'
  },
  limit: 20
});

Error Handling

The SDK provides specific error types for different failure scenarios.

import {
  CortexDBError,
  CortexDBNotFoundError,
  CortexDBValidationError,
  CortexDBConnectionError,
  CortexDBTimeoutError
} from '@dooor-ai/cortexdb';

try {
  const record = await client.records.get('articles', 'invalid-id');
} catch (error) {
  if (error instanceof CortexDBNotFoundError) {
    console.log('Record not found');
  } else if (error instanceof CortexDBValidationError) {
    console.log('Invalid data:', error.message);
  } else if (error instanceof CortexDBConnectionError) {
    console.log('Connection failed:', error.message);
  } else if (error instanceof CortexDBTimeoutError) {
    console.log('Request timed out:', error.message);
  } else if (error instanceof CortexDBError) {
    console.log('General error:', error.message);
  }
}

Examples

Check the examples/ directory for complete working examples:

Run examples:

npx ts-node -O '{"module":"commonjs"}' examples/quickstart.ts

Development

Setup

# Clone repository
git clone https://github.com/yourusername/cortexdb
cd cortexdb/clients/typescript

# Install dependencies
npm install

# Build
npm run build

Scripts

# Build TypeScript
npm run build

# Build in watch mode
npm run build:watch

# Clean build artifacts
npm run clean

# Lint code
npm run lint

# Format code
npm run format

Requirements

  • Node.js >= 18.0.0 (for native fetch support)
  • CortexDB gateway running locally or remotely
  • Embedding provider configured (OpenAI, Gemini, etc.) if using vectorization

Architecture

CortexDB integrates multiple technologies:

  • PostgreSQL: Stores structured data and metadata
  • Qdrant: Vector database for semantic search
  • MinIO: Object storage for files
  • Docling: Advanced document processing and text extraction

The SDK abstracts this complexity into a simple, unified API.

Advanced RAG Strategies (v0.4.0+)

CortexDB now supports multiple RAG strategies to improve search quality and relevance. Choose the strategy that best fits your use case:

Available Strategies

  • SIMPLE: Basic vector similarity search (default)
  • MULTI_QUERY: Generate multiple query variations and combine results using Reciprocal Rank Fusion
  • HYDE: Generate hypothetical documents and use them for improved retrieval
  • RERANK: Use LLM to rerank search results by relevance
  • FUSION: Combine multi-query expansion with LLM reranking
  • CONTEXTUAL_QUERY: Reformulate queries based on conversation context

Setup AI Providers

Before using advanced strategies, configure an AI provider:

// Create an AI provider for query expansion/reranking
const aiProvider = await client.aiProviders.create({
  name: "Gemini Flash",
  provider: "gemini",
  api_key: "your-gemini-api-key",
  model: "gemini-2.5-flash",
  enabled: true,
});

// List providers
const providers = await client.aiProviders.list();

// Update provider
await client.aiProviders.update(aiProvider.id, {
  model: "gemini-2.0-flash",
});
import { RAGStrategy } from '@dooor-ai/cortexdb';

// Simple search (default)
const simpleResults = await client.records.searchAdvanced('documents', {
  query: 'What is machine learning?',
  limit: 10,
  strategy: RAGStrategy.SIMPLE,
});

// Multi-query with automatic query expansion
const multiQueryResults = await client.records.searchAdvanced('documents', {
  query: 'What is machine learning?',
  limit: 10,
  strategy: RAGStrategy.MULTI_QUERY,
  strategyConfig: {
    num_queries: 5, // Generate 5 query variations
  },
  aiProviderName: "Gemini Flash", // Use provider by name
});

// HyDE: Generate hypothetical document for better retrieval
const hydeResults = await client.records.searchAdvanced('documents', {
  query: 'Explain neural networks',
  limit: 10,
  strategy: RAGStrategy.HYDE,
  strategyConfig: {
    document_length: 200, // Length of hypothetical document
  },
  aiProviderName: "Gemini Flash",
});

// Rerank: Use LLM to reorder results by relevance
const rerankResults = await client.records.searchAdvanced('documents', {
  query: 'Benefits of deep learning',
  limit: 10,
  strategy: RAGStrategy.RERANK,
  strategyConfig: {
    initial_k: 50, // Fetch 50 results then rerank to top 10
  },
  aiProviderName: "Gemini Flash",
});

// Fusion: Best of both worlds (multi-query + reranking)
const fusionResults = await client.records.searchAdvanced('documents', {
  query: 'How does AI work?',
  limit: 10,
  strategy: RAGStrategy.FUSION,
  strategyConfig: {
    num_queries: 5,
    initial_k: 50,
  },
  aiProviderName: "Gemini Flash",
});

// Contextual: Reformulate query based on conversation history
const contextualResults = await client.records.searchAdvanced('documents', {
  query: 'What about its applications?',
  limit: 10,
  strategy: RAGStrategy.CONTEXTUAL_QUERY,
  strategyConfig: {
    context: [
      'Previous: What is machine learning?',
      'Answer: Machine learning is a subset of AI...',
    ],
  },
  aiProviderName: "Gemini Flash",
});

// Access results
fusionResults.results.forEach(result => {
  console.log(`Score: ${result.score}`);
  console.log(`Content: ${result.record.content}`);
  console.log(`Strategy used: ${fusionResults.strategy_used}`);
});

Collection-Specific Delegates

The advanced search is also available on collection delegates:

// Using the facade pattern
const results = await client.records.documents.searchAdvanced({
  query: 'Machine learning applications',
  strategy: RAGStrategy.FUSION,
  aiProviderName: "Gemini Flash",
});

Performance Tips

  • SIMPLE: Fastest, use for basic semantic search
  • MULTI_QUERY: 5x slower than simple (generates 5 queries)
  • HYDE: Similar to multi-query, good for questions
  • RERANK: Moderate cost, great for accuracy improvement
  • FUSION: Highest cost and latency, best quality
  • CONTEXTUAL_QUERY: Use for conversational interfaces

For more details, see RAG Strategies Documentation.

License

MIT License - see LICENSE for details.