Package Exports

@hpbyte/h-codex-core
@hpbyte/h-codex-core/dist/src/index.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (@hpbyte/h-codex-core) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

@hpbyte/h-codex-core

Core package for h-codex semantic code indexing and search.

✨ Features

AST-Based Chunking: Parse code using tree-sitter for intelligent chunk boundaries
Semantic Embeddings: Generate embeddings using OpenAI text-embedding models
File Discovery: Explore codebases with configurable ignore patterns
Vector Search: Store and search embeddings in PostgreSQL with pgvector

🚀 Quick Start

Installation

pnpm add @hpbyte/h-codex-core

Environment Setup

Create a .env file with:

LLM_API_KEY=your_llm_api_key_here
LLM_BASE_URL=your_llm_base_url_here (default is openai baseurl: https://api.openai.com/v1)
EMBEDDING_MODEL=text-embedding-3-small
DB_CONNECTION_STRING=postgresql://postgres:password@localhost:5432/h-codex

Usage Example

import { indexer, semanticSearch } from '@hpbyte/h-codex-core'

// Index a codebase
const indexResult = await indexer.index('./path/to/codebase')
console.log(`Indexed ${indexResult.indexedFiles} files and ${indexResult.totalChunks} code chunks`)

// Search for code
const searchResults = await semanticSearch.search('database connection implementation')
console.log(searchResults)

🛠️ API Reference

Indexer

Indexes code repositories by exploring files, chunking code, and generating embeddings.

const stats = await indexer.index(
  path: string,               // Path to the codebase
  options?: {
    ignorePatterns?: string[], // Additional glob patterns to ignore
    maxChunkSize?: number      // Override default chunk size
  }
): Promise<{
  indexedFiles: number,       // Number of indexed files
  totalChunks: number         // Total code chunks created
}>

Semantic Search

Search indexed code using natural language queries.

const results = await semanticSearch.search(
  query: string,                // Natural language search query
  options?: {
    limit?: number,             // Max results to return (default: 10)
    threshold?: number          // Minimum similarity score (default: 0.5)
  }
): Promise<Array<{
  id: string,                   // Chunk identifier
  content: string,              // Code content
  relativePath: string,         // File path relative to indexed root
  absolutePath: string,         // Absolute file path
  language: string,             // Programming language
  startLine: number,            // Starting line in file
  endLine: number,              // Ending line in file
  score: number                 // Similarity score (0-1)
}>>

🏗️ Architecture

Ingestion Pipeline

Explorer (ingestion/explorer/) - Discover files in repositories
Chunker (ingestion/chunker/) - Parse and chunk code using AST
Embedder (ingestion/embedder/) - Generate semantic embeddings
Indexer (ingestion/indexer/) - Orchestrate the full ingestion pipeline

Storage

Repository (storage/repository/) - Database operations for chunks and embeddings
Schema (storage/schema/) - Drizzle ORM schema definitions
Migrations - Managed with Drizzle ORM

Search

Semantic Search (search/) - Vector similarity search with filtering

🧑‍💻 Development

# Install dependencies
pnpm install

# Run database migrations
pnpm run db:migrate

# Build the package
pnpm build

# Run in development mode with hot reload
pnpm dev

📄 License

This project is licensed under the MIT License.