JSPM

  • Created
  • Published
  • Downloads 1937
  • Score
    100M100P100Q109816F
  • License MIT

Official TypeScript/JavaScript SDK for CortexDB - Multi-modal RAG Platform with advanced document processing

Package Exports

  • @dooor-ai/cortexdb
  • @dooor-ai/cortexdb/dist/index.js
  • @dooor-ai/cortexdb/dist/schema-cli/index.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (@dooor-ai/cortexdb) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

CortexDB TypeScript SDK

Official TypeScript/JavaScript client for CortexDB.

What is CortexDB?

CortexDB is a multi-modal RAG (Retrieval Augmented Generation) platform that combines traditional database capabilities with vector search and advanced document processing. It enables you to:

  • Store structured and unstructured data in a unified database
  • Automatically extract text from documents (PDF, DOCX, XLSX) using Docling
  • Generate embeddings for semantic search using various providers (OpenAI, Gemini, etc.)
  • Perform hybrid search combining filters with vector similarity
  • Build RAG applications with automatic chunking and vectorization

CortexDB handles the complex infrastructure of vector databases (Qdrant), object storage (MinIO), and traditional databases (PostgreSQL) behind a simple API.

Features

  • Multi-modal document processing: Upload PDFs, DOCX, XLSX files and automatically extract text with OCR fallback
  • Semantic search: Vector-based search using embeddings from OpenAI, Gemini, or custom providers
  • Automatic chunking: Smart text splitting optimized for RAG applications
  • Flexible schema: Define collections with typed fields (string, number, boolean, file, array)
  • Hybrid queries: Combine exact filters with semantic search
  • Storage control: Choose where each field is stored (PostgreSQL, Qdrant, MinIO)
  • Type-safe: Full TypeScript support with comprehensive type definitions
  • Modern API: Async/await using native fetch (Node.js 18+)
  • Infra management: Database (client.databases) and embedding provider (client.embeddingProviders) APIs built-in

Installation

npm install @dooor-ai/cortexdb

Or with yarn:

yarn add @dooor-ai/cortexdb

Or with pnpm:

pnpm add @dooor-ai/cortexdb

Quick Start

import { CortexClient, FieldType, StoreLocation } from '@dooor-ai/cortexdb';

async function main() {
  const client = new CortexClient({
    baseUrl: 'http://localhost:8000'
  });

  // Create a collection with vectorization enabled
  await client.collections.create(
    'documents',
    [
      { name: 'title', type: FieldType.STRING },
      { name: 'content', type: FieldType.TEXT, vectorize: true },
      { name: 'published_at', type: FieldType.DATETIME, store_in: [StoreLocation.POSTGRES] }
    ],
    'your-embedding-provider-id'  // Required when vectorize=true
  );

  // Create a record
  const record = await client.records.create('documents', {
    title: 'Introduction to AI',
    content: 'Artificial intelligence is transforming how we build software...'
  });

  // Semantic search - finds relevant content by meaning, not just keywords
  const results = await client.records.search(
    'documents',
    'How is AI changing software development?',
    undefined,
    10
  );

  results.results.forEach(result => {
    console.log(`Score: ${result.score.toFixed(4)}`);
    console.log(`Title: ${result.record.data.title}`);
    console.log(`Content: ${result.record.data.content}\n`);
  });

  await client.close();
}

main();

Project-Specific Typing

The SDK becomes fully type-safe once you apply your YAML schema with the Dooor CLI:

npx dooor schema apply          # reads dooor/schemas by default and generates types in dooor/generated/

This command creates dooor/generated/cortex-schema.ts and automatically augments the SDK types. After the file exists in your project, you can keep importing CortexClient from @dooor-ai/cortexdb; TypeScript will infer the fields/collections defined in your YAML. Invalid field names or missing required properties inside client.records.create('my_collection', {...}) now trigger compile-time errors, Prisma-style.

If you need an explicit factory, the generated file also exports createCortexClient() and TypedCortexClient helpers.

ℹ️ The CLI also drops a lightweight .d.ts shim in node_modules/@dooor-ai/cortexdb/generated/schema.d.ts, so TypeScript picks up your schema automatically—no need to tweak tsconfig.json.

Prisma-like Records Delegates

Once the schema is generated, you can call collections with property access instead of passing strings:

// Fully typed
const record = await client.records.tool_calls.create({
  chatId: "chat-123",
  description: "RAG invocation summary",
  createdAt: new Date().toISOString(),
});

// String form still available when you need something dynamic
await client.records.create("tool_calls", {
  chatId,
  description,
  createdAt,
});

Usage

Initialize Client

import { CortexClient } from '@dooor-ai/cortexdb';

// Using connection string (recommended)
const client = new CortexClient('cortexdb://localhost:8000');

// With API key
const client = new CortexClient('cortexdb://my-api-key@localhost:8000');

// Production (HTTPS auto-detected)
const client = new CortexClient('cortexdb://my-key@api.cortexdb.com');

// Using options object (alternative)
const client = new CortexClient({
  baseUrl: 'http://localhost:8000',
  apiKey: 'your-api-key',
  timeout: 60000  // 60 seconds
});

Connection String Format:
cortexdb://[api_key@]host[:port]

Benefits:

  • Single string configuration
  • Easy to store in environment variables
  • Familiar pattern (like PostgreSQL, MongoDB, Redis)
  • Auto-detects HTTP vs HTTPS

Databases

// Create database
await client.databases.create({ name: 'ai_docs', description: 'Knowledge base' });

// List databases
const databases = await client.databases.list();

// Delete database
await client.databases.delete('ai_docs');

Embedding Providers

await client.embeddingProviders.create({
  name: 'Gemini Flash',
  provider: 'gemini',
  embedding_model: 'models/text-embedding-004',
  api_key: process.env.GEMINI_API_KEY!,
});

const providers = await client.embeddingProviders.list();

Collections

Collections define the schema for your data. Each collection can have multiple fields with different types and storage options.

import { FieldType, StoreLocation } from '@dooor-ai/cortexdb';

// Create collection with vectorization
const collection = await client.collections.create(
  'articles',
  [
    {
      name: 'title',
      type: FieldType.STRING
    },
    {
      name: 'content',
      type: FieldType.TEXT,
      vectorize: true  // Enable semantic search on this field
    },
    {
      name: 'year',
      type: FieldType.INT,
      store_in: [StoreLocation.POSTGRES, StoreLocation.QDRANT_PAYLOAD]
    }
  ],
  'embedding-provider-id'  // Required when any field has vectorize=true
);

// List collections
const collections = await client.collections.list();

// Get collection schema
const schema = await client.collections.get('articles');

// Delete collection and all its records
await client.collections.delete('articles');

Records

Records are the actual data stored in collections. They must match the collection schema.

import fs from 'node:fs';

// Create record (with optional file upload)
const created = await client.records.create(
  'articles',
  {
    title: 'Machine Learning Basics',
    content: 'Machine learning é um subconjunto de IA focado em aprender com dados...',
    year: 2024,
  },
  {
    attachment: fs.readFileSync('ml-intro.pdf'),
  }
);

// Get record by ID
const fetched = await client.records.get('articles', created.id);

// Update record
const updated = await client.records.update('articles', created.id, {
  year: 2025,
});

// Delete record
await client.records.delete('articles', created.id);

// List records with filters/pagination
const results = await client.records.list('articles', {
  limit: 10,
  offset: 0,
  filters: { year: { $gte: 2023 } },
});

Schema CLI (YAML)

Instale o CLI (recomendado em devDependencies):

npm install --save-dev dooor

Utilize o CLI unificado dooor para sincronizar schemas declarativos: Instale também a extensão “Dooor Tools” no VS Code/Cursor para validação em tempo real (Open VSX).

# Verificar diferenças entre YAML local e CortexDB
npx dooor schema diff --dir dooor/schemas

# Criar coleções que ainda não existem
npx dooor schema apply --dir dooor/schemas

# Aplicar sem gerar tipos (por padrão o apply já gera)
npx dooor schema apply --no-generate-types

# Gerar tipos TypeScript para uso nos serviços
npx dooor schema generate-types --dir dooor/schemas --out src/generated/cortex-schema.ts

Tipagem automática das coleções

Após sincronizar o schema, o CLI gera dooor/generated/cortex-schema.ts com os tipos derivados. Informe esse schema ao SDK para ganhar autocomplete e validação semelhantes ao Prisma:

import { CortexClient } from '@dooor-ai/cortexdb';
import type {
  CortexGeneratedSchema,
  CollectionCreateInput,
} from '../dooor/generated/cortex-schema';

const client = new CortexClient<CortexGeneratedSchema>(
  process.env.CORTEXDB_CONNECTION!,
);

const payload: CollectionCreateInput<'tool_calls'> = {
  chatId,
  workspaceId,
  toolName,
  description,
  toolOutput,
  createdAt: new Date().toISOString(),
};

await client.records.create('tool_calls', payload);

Os generics se propagam para records.update, records.list, records.get e records.search. Se preferir o modo dinâmico antigo, instancie new CortexClient() sem parâmetro genérico.

Defina CORTEXDB_CONNECTION (ex.: cortexdb://key@host:8000) ou as variáveis CORTEXDB_BASE_URL + CORTEXDB_API_KEY antes de rodar os comandos. Se nenhum diretório for informado, o CLI procura automaticamente em dooor/schemas.

Para evitar repetir flags, configure dooor/config.yaml na raiz do projeto:

cortexdb:
  connection: env(CORTEXDB_CONNECTION)
  defaultEmbeddingProvider: default-provider

schema:
  dir: dooor/schemas
  typesOut: dooor/generated/cortex-schema.ts

Você pode sobrescrever com dooor/config.local.yaml ou apontar outro caminho via DOOOR_CONFIG.

Semantic search finds records by meaning, not just exact keyword matches. It uses vector embeddings to understand context.

// Basic semantic search
const results = await client.records.search(
  'articles',
  'machine learning fundamentals',
  undefined,
  10
);

// Search with filters - combine semantic search with exact matches
const filteredResults = await client.records.search(
  'articles',
  'neural networks',
  {
    year: 2024,
    category: 'AI'
  },
  5
);

// Process results - ordered by relevance score
filteredResults.results.forEach(result => {
  console.log(`Score: ${result.score.toFixed(4)}`);  // Higher = more relevant
  console.log(`Title: ${result.record.data.title}`);
  console.log(`Year: ${result.record.data.year}`);
});

Working with Files

CortexDB can process documents and automatically extract text for vectorization.

// Create collection with file field
await client.collections.create(
  'documents',
  [
    { name: 'title', type: FieldType.STRING },
    {
      name: 'document',
      type: FieldType.FILE,
      vectorize: true  // Extract text and create embeddings
    }
  ],
  'embedding-provider-id'
);

// Note: File upload support is currently available in the REST API
// TypeScript SDK file upload will be added in a future version

Filter Operators

// Exact match filters
const results = await client.records.list('articles', {
  filters: {
    category: 'technology',
    published: true,
    year: 2024
  }
});

// Combine multiple filters
const filtered = await client.records.list('articles', {
  filters: {
    year: 2024,
    category: 'AI',
    author: 'John Doe'
  },
  limit: 20
});

Error Handling

The SDK provides specific error types for different failure scenarios.

import {
  CortexDBError,
  CortexDBNotFoundError,
  CortexDBValidationError,
  CortexDBConnectionError,
  CortexDBTimeoutError
} from '@dooor-ai/cortexdb';

try {
  const record = await client.records.get('articles', 'invalid-id');
} catch (error) {
  if (error instanceof CortexDBNotFoundError) {
    console.log('Record not found');
  } else if (error instanceof CortexDBValidationError) {
    console.log('Invalid data:', error.message);
  } else if (error instanceof CortexDBConnectionError) {
    console.log('Connection failed:', error.message);
  } else if (error instanceof CortexDBTimeoutError) {
    console.log('Request timed out:', error.message);
  } else if (error instanceof CortexDBError) {
    console.log('General error:', error.message);
  }
}

Examples

Check the examples/ directory for complete working examples:

Run examples:

npx ts-node -O '{"module":"commonjs"}' examples/quickstart.ts

Development

Setup

# Clone repository
git clone https://github.com/yourusername/cortexdb
cd cortexdb/clients/typescript

# Install dependencies
npm install

# Build
npm run build

Scripts

# Build TypeScript
npm run build

# Build in watch mode
npm run build:watch

# Clean build artifacts
npm run clean

# Lint code
npm run lint

# Format code
npm run format

Requirements

  • Node.js >= 18.0.0 (for native fetch support)
  • CortexDB gateway running locally or remotely
  • Embedding provider configured (OpenAI, Gemini, etc.) if using vectorization

Architecture

CortexDB integrates multiple technologies:

  • PostgreSQL: Stores structured data and metadata
  • Qdrant: Vector database for semantic search
  • MinIO: Object storage for files
  • Docling: Advanced document processing and text extraction

The SDK abstracts this complexity into a simple, unified API.

Advanced RAG Strategies (v0.4.0+)

CortexDB now supports multiple RAG strategies to improve search quality and relevance. Choose the strategy that best fits your use case:

Available Strategies

  • SIMPLE: Basic vector similarity search (default)
  • MULTI_QUERY: Generate multiple query variations and combine results using Reciprocal Rank Fusion
  • HYDE: Generate hypothetical documents and use them for improved retrieval
  • RERANK: Use LLM to rerank search results by relevance
  • FUSION: Combine multi-query expansion with LLM reranking
  • CONTEXTUAL_QUERY: Reformulate queries based on conversation context

Setup AI Providers

Before using advanced strategies, configure an AI provider:

// Create an AI provider for query expansion/reranking
const aiProvider = await client.aiProviders.create({
  name: "Gemini Flash",
  provider: "gemini",
  api_key: "your-gemini-api-key",
  model: "gemini-1.5-flash",
  enabled: true,
});

// List providers
const providers = await client.aiProviders.list();

// Update provider
await client.aiProviders.update(aiProvider.id, {
  model: "gemini-2.0-flash",
});
import { RAGStrategy } from '@dooor-ai/cortexdb';

// Simple search (default)
const simpleResults = await client.records.searchAdvanced('documents', {
  query: 'What is machine learning?',
  limit: 10,
  strategy: RAGStrategy.SIMPLE,
});

// Multi-query with automatic query expansion
const multiQueryResults = await client.records.searchAdvanced('documents', {
  query: 'What is machine learning?',
  limit: 10,
  strategy: RAGStrategy.MULTI_QUERY,
  strategyConfig: {
    num_queries: 5, // Generate 5 query variations
  },
  aiProviderName: "Gemini Flash", // Use provider by name
});

// HyDE: Generate hypothetical document for better retrieval
const hydeResults = await client.records.searchAdvanced('documents', {
  query: 'Explain neural networks',
  limit: 10,
  strategy: RAGStrategy.HYDE,
  strategyConfig: {
    document_length: 200, // Length of hypothetical document
  },
  aiProviderName: "Gemini Flash",
});

// Rerank: Use LLM to reorder results by relevance
const rerankResults = await client.records.searchAdvanced('documents', {
  query: 'Benefits of deep learning',
  limit: 10,
  strategy: RAGStrategy.RERANK,
  strategyConfig: {
    initial_k: 50, // Fetch 50 results then rerank to top 10
  },
  aiProviderName: "Gemini Flash",
});

// Fusion: Best of both worlds (multi-query + reranking)
const fusionResults = await client.records.searchAdvanced('documents', {
  query: 'How does AI work?',
  limit: 10,
  strategy: RAGStrategy.FUSION,
  strategyConfig: {
    num_queries: 5,
    initial_k: 50,
  },
  aiProviderName: "Gemini Flash",
});

// Contextual: Reformulate query based on conversation history
const contextualResults = await client.records.searchAdvanced('documents', {
  query: 'What about its applications?',
  limit: 10,
  strategy: RAGStrategy.CONTEXTUAL_QUERY,
  strategyConfig: {
    context: [
      'Previous: What is machine learning?',
      'Answer: Machine learning is a subset of AI...',
    ],
  },
  aiProviderName: "Gemini Flash",
});

// Access results
fusionResults.results.forEach(result => {
  console.log(`Score: ${result.score}`);
  console.log(`Content: ${result.record.content}`);
  console.log(`Strategy used: ${fusionResults.strategy_used}`);
});

Collection-Specific Delegates

The advanced search is also available on collection delegates:

// Using the facade pattern
const results = await client.records.documents.searchAdvanced({
  query: 'Machine learning applications',
  strategy: RAGStrategy.FUSION,
  aiProviderName: "Gemini Flash",
});

Performance Tips

  • SIMPLE: Fastest, use for basic semantic search
  • MULTI_QUERY: 5x slower than simple (generates 5 queries)
  • HYDE: Similar to multi-query, good for questions
  • RERANK: Moderate cost, great for accuracy improvement
  • FUSION: Highest cost and latency, best quality
  • CONTEXTUAL_QUERY: Use for conversational interfaces

For more details, see RAG Strategies Documentation.

License

MIT License - see LICENSE for details.