JSPM

  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 1
  • Score
    100M100P100Q30463F
  • License MIT

Hierarchical markdown chunking for RAG systems with AI-powered context summarization

Package Exports

  • treechunk
  • treechunk/summarizer/openai

Readme

TreeChunk

Contextual, hierarchical markdown chunking for RAG systems

WHAT?

Splits markdown documents into self-contained chunks that contain (hopefully) enough contextual information about what part of the document contained the chunk that it's useful for generation.

There's a static demo of this in action here: sgnt.ai/treechunk-demo.

Synopsis

Programmatic:

import { TreeChunker, OpenAISummarizer } from 'treechunk';

const summarizer = new OpenAISummarizer('Technical documentation context');
const chunker = new TreeChunker(summarizer);

await chunker.makeChunks(documentNode, async (chunk, source) => {
  console.log(chunk); // The enriched chunk with context
  console.log(source); // The original markdown source for this section
});

Build a demo HTML page:

OPENAI_API_KEY=etc
tsx bin/demo.ts ./demo/Scamming.md "The document has come from the Wiki for an online crime game"

API

TreeChunker

  • new TreeChunker(summarizer) - Create chunker with a summarizer
  • makeChunks(node, onChunk, options?) - Process document, calling onChunk for each chunk
    • onChunk: (chunk: string, source: string) => Promise<void> - Callback receives:
      • chunk: The enriched chunk with hierarchical title and AI-generated context
      • source: The original markdown source for this section
    • options?: TreeChunkerOptions - Optional configuration:
      • dryRun?: boolean - When true, returns chunks without AI summaries (chunk equals source)

Summarizers

  • new OpenAISummarizer(context?, apiKey?) - OpenAI implementation
    • context: Optional string added to prompts
    • apiKey: Optional, defaults to OPENAI_API_KEY env var

Parser

  • parseMarkdown(markdown) - Parse markdown into DocumentNode tree
  • renderDocument(node) - Convert DocumentNode back to markdown

Prior Art / See Also

This was an independent -- but not novel -- discovery, by which I mean I built it and then went to try and find out what other people who'd also built it called theirs. It brings together the following ideas:

Todo / Next steps

  • Expand out the demo
  • Add raw chunk and location data to callback (low priority, as I don't need this)

License

MIT

If you use this, port this, whatever, I'd love it if you gave this project a shout-out.

Author

Peter Sergeant pete@sgnt.ai

This was built for Torn, whose Wiki the "Scamming" article is taken.