JSPM

  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 9
  • Score
    100M100P100Q59573F
  • License MIT

Core embedding and vector store utilities for AskText voice Q&A.

Package Exports

  • @asktext/core
  • @asktext/core/dist/index.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (@asktext/core) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

@asktext/core

TypeScript-first embedding and retrieval engine for voice-enabled Q&A on articles.

What it does

  • Text processing: Splits HTML/Markdown into semantic chunks with configurable overlap
  • Embeddings: Generates OpenAI embeddings for each chunk
  • Storage: Saves chunks + embeddings to your database (Prisma JSON, pgvector, or custom)
  • Retrieval: Semantic search to find relevant passages for user questions

Installation

npm install @asktext/core openai @prisma/client

Quick Start

1. Database Schema

Add to your schema.prisma:

model ArticleChunk {
  id         String   @id @default(cuid())
  postId     String
  chunkIndex Int
  content    String   @db.Text
  startChar  Int
  endChar    Int
  embedding  String   @db.Text   // JSON-encoded float[]

  @@index([postId, chunkIndex])
}

Run npx prisma db push.

2. Embed Articles

import { PrismaClient } from '@prisma/client';
import { OpenAIEmbedder, embedAndStore } from '@asktext/core';

const prisma = new PrismaClient();
const store = embedAndStore.createPrismaJsonStore(prisma);
const embedder = new OpenAIEmbedder({ 
  apiKey: process.env.OPENAI_API_KEY! 
});

// Call this when publishing/updating articles
export async function saveEmbeddings(postId: string, htmlContent: string) {
  await embedAndStore({ 
    articleId: postId, 
    htmlOrMarkdown: htmlContent, 
    embedder, 
    store 
  });
}

3. Retrieve Passages

import { retrievePassages } from '@asktext/core';

const passages = await retrievePassages({
  query: "How does binary search work?",
  store,
  embedder,
  filter: { postId: "article-123" },
  limit: 5
});

Configuration

Text Splitting

import { TextSplitter } from '@asktext/core';

const splitter = new TextSplitter({
  chunkSize: 1500,     // characters per chunk
  chunkOverlap: 200,   // overlap between chunks
  separators: ['\n\n', '\n', '. ', ' ']  // split priorities
});

Custom Vector Store

Implement the VectorStore interface for your database:

interface VectorStore {
  saveChunks(chunks: ChunkWithEmbedding[]): Promise<void>;
  searchSimilar(embedding: number[], limit: number, filter?: any): Promise<ChunkWithScore[]>;
  deleteByArticleId(articleId: string): Promise<void>;
}

Environment Variables

OPENAI_API_KEY=sk-...          # Required for embeddings
DATABASE_URL=postgresql://...   # For Prisma store

Advanced Usage

Batch Processing

const articles = await getArticlesToProcess();

for (const article of articles) {
  await saveEmbeddings(article.id, article.content);
  console.log(`Processed: ${article.title}`);
}

Custom Embedder

class CustomEmbedder implements Embedder {
  async embed(texts: string[]): Promise<number[][]> {
    // Your embedding logic
  }
}

Cost Estimation

  • 100k words ≈ 75k tokens ≈ $0.01 with text-embedding-3-small
  • 1M words ≈ 750k tokens ≈ $0.10

License

MIT