JSPM

  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 110
  • Score
    100M100P100Q86753F
  • License MIT

Core indexing and search functionality for h-codex

Package Exports

  • @hpbyte/h-codex-core
  • @hpbyte/h-codex-core/dist/src/index.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (@hpbyte/h-codex-core) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

@hpbyte/h-codex-core

Core package for h-codex semantic code indexing and search.

✨ Features

  • AST-Based Chunking: Parse code using tree-sitter for intelligent chunk boundaries
  • Semantic Embeddings: Generate embeddings using OpenAI text-embedding models
  • File Discovery: Explore codebases with configurable ignore patterns
  • Vector Search: Store and search embeddings in PostgreSQL with pgvector

🚀 Quick Start

Installation

pnpm add @hpbyte/h-codex-core

Environment Setup

Create a .env file with:

LLM_API_KEY=your_llm_api_key_here
LLM_BASE_URL=your_llm_base_url_here (default is openai baseurl: https://api.openai.com/v1)
EMBEDDING_MODEL=text-embedding-3-small
DB_CONNECTION_STRING=postgresql://postgres:password@localhost:5432/h-codex

Usage Example

import { indexer, semanticSearch } from '@hpbyte/h-codex-core'

// Index a codebase
const indexResult = await indexer.index('./path/to/codebase')
console.log(`Indexed ${indexResult.indexedFiles} files and ${indexResult.totalChunks} code chunks`)

// Search for code
const searchResults = await semanticSearch.search('database connection implementation')
console.log(searchResults)

🛠️ API Reference

Indexer

Indexes code repositories by exploring files, chunking code, and generating embeddings.

const stats = await indexer.index(
  path: string,               // Path to the codebase
  options?: {
    ignorePatterns?: string[], // Additional glob patterns to ignore
    maxChunkSize?: number      // Override default chunk size
  }
): Promise<{
  indexedFiles: number,       // Number of indexed files
  totalChunks: number         // Total code chunks created
}>

Search indexed code using natural language queries.

const results = await semanticSearch.search(
  query: string,                // Natural language search query
  options?: {
    limit?: number,             // Max results to return (default: 10)
    threshold?: number          // Minimum similarity score (default: 0.5)
  }
): Promise<Array<{
  id: string,                   // Chunk identifier
  content: string,              // Code content
  relativePath: string,         // File path relative to indexed root
  absolutePath: string,         // Absolute file path
  language: string,             // Programming language
  startLine: number,            // Starting line in file
  endLine: number,              // Ending line in file
  score: number                 // Similarity score (0-1)
}>>

🏗️ Architecture

Ingestion Pipeline

  • Explorer (ingestion/explorer/) - Discover files in repositories
  • Chunker (ingestion/chunker/) - Parse and chunk code using AST
  • Embedder (ingestion/embedder/) - Generate semantic embeddings
  • Indexer (ingestion/indexer/) - Orchestrate the full ingestion pipeline

Storage

  • Repository (storage/repository/) - Database operations for chunks and embeddings
  • Schema (storage/schema/) - Drizzle ORM schema definitions
  • Migrations - Managed with Drizzle ORM
  • Semantic Search (search/) - Vector similarity search with filtering

🧑‍💻 Development

# Install dependencies
pnpm install

# Run database migrations
pnpm run db:migrate

# Build the package
pnpm build

# Run in development mode with hot reload
pnpm dev

📄 License

This project is licensed under the MIT License.