Package Exports

@asktext/core
@asktext/core/dist/index.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (@asktext/core) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

@asktext/core

TypeScript-first embedding and retrieval engine for voice-enabled Q&A on articles.

What it does

Text processing: Splits HTML/Markdown into semantic chunks with configurable overlap
Embeddings: Generates OpenAI embeddings for each chunk
Storage: Saves chunks + embeddings to your database (Prisma JSON, pgvector, or custom)
Retrieval: Semantic search to find relevant passages for user questions

Installation

npm install @asktext/core openai @prisma/client

Quick Start

1. Database Schema

Add to your schema.prisma:

model ArticleChunk {
  id         String   @id @default(cuid())
  postId     String
  chunkIndex Int
  content    String   @db.Text
  startChar  Int
  endChar    Int
  embedding  String   @db.Text   // JSON-encoded float[]

  @@index([postId, chunkIndex])
}

Run npx prisma db push.

2. Embed Articles

import { PrismaClient } from '@prisma/client';
import { OpenAIEmbedder, embedAndStore } from '@asktext/core';

const prisma = new PrismaClient();
const store = embedAndStore.createPrismaJsonStore(prisma);
const embedder = new OpenAIEmbedder({ 
  apiKey: process.env.OPENAI_API_KEY! 
});

// Call this when publishing/updating articles
export async function saveEmbeddings(postId: string, htmlContent: string) {
  await embedAndStore({ 
    articleId: postId, 
    htmlOrMarkdown: htmlContent, 
    embedder, 
    store 
  });
}

3. Retrieve Passages

import { retrievePassages } from '@asktext/core';

const passages = await retrievePassages({
  query: "How does binary search work?",
  store,
  embedder,
  filter: { postId: "article-123" },
  limit: 5
});

Configuration

Text Splitting

import { TextSplitter } from '@asktext/core';

const splitter = new TextSplitter({
  chunkSize: 1500,     // characters per chunk
  chunkOverlap: 200,   // overlap between chunks
  separators: ['\n\n', '\n', '. ', ' ']  // split priorities
});

Custom Vector Store

Implement the VectorStore interface for your database:

interface VectorStore {
  saveChunks(chunks: ChunkWithEmbedding[]): Promise<void>;
  searchSimilar(embedding: number[], limit: number, filter?: any): Promise<ChunkWithScore[]>;
  deleteByArticleId(articleId: string): Promise<void>;
}

Environment Variables

OPENAI_API_KEY=sk-...          # Required for embeddings
DATABASE_URL=postgresql://...   # For Prisma store

Advanced Usage

Batch Processing

const articles = await getArticlesToProcess();

for (const article of articles) {
  await saveEmbeddings(article.id, article.content);
  console.log(`Processed: ${article.title}`);
}

Custom Embedder

class CustomEmbedder implements Embedder {
  async embed(texts: string[]): Promise<number[][]> {
    // Your embedding logic
  }
}

Cost Estimation

100k words ≈ 75k tokens ≈ $0.01 with text-embedding-3-small
1M words ≈ 750k tokens ≈ $0.10

License

MIT