Package Exports
- @jsilvanus/embedeer
Readme
embedeer

A Node.js Embedding Tool
A Node.js tool for generating text embeddings using transformers.js with ONXX models from Hugging Face.
Supports batched input, parallel execution, isolated child-process workers (default) or in-process threads, quantization, optional GPU acceleration, and Hugging Face auth.
Features
- Downloads any Hugging Face feature-extraction model on first use (cached in
~/.embedeer/models) - Isolated processes (default) — a worker crash cannot bring down the caller
- In-process threads — opt-in via
mode: 'thread'for lower overhead - Sequential execution when
concurrency: 1 - Configurable batch size and concurrency
- GPU acceleration — optional CUDA (Linux x64) and DirectML (Windows x64), no extra packages needed
- Hugging Face API token support (
--token/HF_TOKENenv var) - Quantization via
dtype(fp32·fp16·q8·q4·q4f16·auto) - Rich CLI: pull model, embed from file, dump output as JSON / TXT / SQL
Installation
npm install @jsilvanus/embedeerGPU acceleration (CUDA on Linux x64, DirectML on Windows x64) is built into onnxruntime-node
which ships as a transitive dependency. No additional packages are required.
For CUDA on Linux x64 you also need the CUDA 12 system libraries:
# Ubuntu / Debian
sudo apt install cuda-toolkit-12-6 libcudnn9-cuda-12
## Programmatic APIModel management
Embedeer supports pre-caching and managing downloaded models.
- Pull (pre-cache) a model via the CLI:
npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2- Programmatic pre-cache using
loadModel():
import { loadModel } from '@jsilvanus/embedeer';
const { modelName, cacheDir } = await loadModel('Xenova/all-MiniLM-L6-v2', {
token: 'hf_...', // optional HF token
dtype: 'q8', // optional quantization
cacheDir: '/my/cache', // optional override
});Cache location: default is
~/.embedeer/models. Override with the CLI--cache-diroption or thecacheDirargument toloadModel().Removing cached models: delete the model directory from the cache. Example:
# Unix
rm -rf ~/.embedeer/models/Xenova-all-MiniLM-L6-v2
# PowerShell (Windows)
Remove-Item -Recurse -Force $env:USERPROFILE\.embedeer\models\Xenova-all-MiniLM-L6-v2- Advanced: see
src/model-management.jsfor low-level cache helpers.
Model compatibility (ONNX)
Embedeer runs models via onnxruntime-node. Models chosen from Hugging Face must provide an ONNX export compatible with ONNX Runtime, or be convertible to ONNX (see Optimum). If a model does not include an ONNX build, export it and place the ONNX files in your cache or publish them to the model repository so embedeer can load them.
Programmatic runtime & cache helpers
Two small runtime/cache helpers are available from the public API:
getLoadedModels()— returns an array of model names currently loaded by active worker pools.deleteModel(modelName, { cacheDir? })— remove cached model directories matchingmodelName.
Example:
import { getLoadedModels, deleteModel } from '@jsilvanus/embedeer';
// Synchronous list of models currently loaded by any running WorkerPool
console.log(getLoadedModels()); // e.g. ['Xenova/all-MiniLM-L6-v2']
// Remove a cached model from disk (async)
const removed = await deleteModel('Xenova/all-MiniLM-L6-v2');
console.log('removed?', removed);Explainer — deterministic LLM interface
This was deprecated and moved to npm package @jsilvanus/chattydeer in 1.3.0.
Input Sources
Embed texts (CPU — default)
import { Embedder } from '@jsilvanus/embedeer';
const embedder = await Embedder.create('Xenova/all-MiniLM-L6-v2', {
batchSize: 32, // texts per worker task (default: 32)
concurrency: 2, // parallel workers (default: 2)
mode: 'process', // 'process' | 'thread' (default: 'process')
pooling: 'mean', // 'mean' | 'cls' | 'none' (default: 'mean')
normalize: true, // L2-normalise vectors (default: true)
token: 'hf_...', // HF API token (optional; also reads HF_TOKEN env)
dtype: 'q8', // quantization dtype (optional)
cacheDir: '/my/cache', // override model cache (default: ~/.embedeer/models)
});
const vectors = await embedder.embed(['Hello world', 'Foo bar baz']);
// → number[][] (one 384-dim vector per text for all-MiniLM-L6-v2)
await embedder.destroy(); // shut down worker processesTypeScript example
The package includes TypeScript declarations so imports are typed automatically.
import { Embedder } from '@jsilvanus/embedeer';
async function main() {
const embedder = await Embedder.create('Xenova/all-MiniLM-L6-v2', { batchSize: 32, concurrency: 2 });
const vectors = await embedder.embed(['Hello world', 'Foo bar baz']);
// vectors: number[][]
await embedder.destroy();
}
main().catch(console.error);Programmatic profile generation (optional)
You can generate and save a per-user performance profile which Embedder.create() will
automatically apply. This is useful to pick the best batchSize / concurrency for your
machine without manual tuning.
import { Embedder } from '@jsilvanus/embedeer';
// Quick profile generation (writes ~/.embedeer/perf-profile.json)
await Embedder.generateAndSaveProfile({ mode: 'quick', device: 'cpu', sampleSize: 100 });
// Subsequent calls to Embedder.create() will auto-apply the saved profile by default.Embed texts with GPU
import { Embedder } from '@jsilvanus/embedeer';
// Auto-detect GPU (falls back to CPU if no provider is installed)
const embedder = await Embedder.create('Xenova/all-MiniLM-L6-v2', {
device: 'auto',
});
// Require GPU (throws if no provider is available)
const embedder = await Embedder.create('Xenova/all-MiniLM-L6-v2', {
device: 'gpu',
});
// Explicitly select an execution provider
const embedder = await Embedder.create('Xenova/all-MiniLM-L6-v2', {
provider: 'cuda', // 'cuda' | 'dml'
});CLI
npx @jsilvanus/embedeer [options]
Model management (pull / cache model):
npx @jsilvanus/embedeer --model <name>
Embed texts (batch):
npx @jsilvanus/embedeer --model <name> --data "text1" "text2" ...
npx @jsilvanus/embedeer --model <name> --data '["text1","text2"]'
npx @jsilvanus/embedeer --model <name> --file texts.txt
echo '["t1","t2"]' | npx @jsilvanus/embedeer --model <name>
printf 'a\0b\0c' | npx @jsilvanus/embedeer --model <name> --delimiter '\0'
Interactive / streaming line-reader:
npx @jsilvanus/embedeer --model <name> --interactive --dump out.jsonl
cat big.txt | npx @jsilvanus/embedeer --model <name> -i --output csv --dump out.csv
Options:
-m, --model <name> Hugging Face model (default: Xenova/all-MiniLM-L6-v2)
-d, --data <text...> Text(s) or JSON array to embed
--file <path> Input file: JSON array or delimited texts
-D, --delimiter <str> Record separator for stdin/file (default: \n)
Escape sequences supported: \0 \n \t \r
-i, --interactive Interactive line-reader (see below)
--dump <path> Write output to file instead of stdout
--output <format> Output: json|jsonl|csv|txt|sql (default: json)
--with-text Include source text alongside each embedding
-b, --batch-size <n> Texts per worker batch (default: 32)
-c, --concurrency <n> Parallel workers (default: 2)
--mode process|thread Worker mode (default: process)
-p, --pooling <mode> mean|cls|none (default: mean)
--no-normalize Disable L2 normalisation
--dtype <type> Quantization: fp32|fp16|q8|q4|q4f16|auto
--token <tok> Hugging Face API token (or set HF_TOKEN env)
--cache-dir <path> Model cache directory (default: ~/.embedeer/models)
--device <mode> Compute device: auto|cpu|gpu (default: cpu)
--provider <name> Execution provider override: cpu|cuda|dml
-h, --help Show this helpInput Sources
Texts can be provided in any of these ways (checked in order):
| Source | How |
|---|---|
| Inline args | --data "text1" "text2" "text3" |
| Inline JSON | --data '["text1","text2"]' |
| File | --file texts.txt (JSON array or one record per line) |
| Stdin | Pipe or redirect — auto-detected; TTY is skipped |
| Interactive | --interactive / -i — line-reader, embeds as you type |
Stdin auto-detection: when stdin is not a TTY (i.e. data is piped or redirected), embedeer reads it before deciding what to do. JSON arrays are accepted directly; otherwise records are split on the delimiter.
Interactive Line-Reader Mode (-i / --interactive)
The interactive mode opens a line-by-line reader that starts embedding as records arrive — ideal for pasting large datasets into a terminal or streaming data from another process.
# Open an interactive session (paste lines, Ctrl+D when done)
npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 --interactive --dump embeddings.jsonl
# Stream a large file through interactive mode with CSV output
cat big.txt | npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 \
--interactive --output csv --dump embeddings.csv
# Interactive with GPU, custom batch size, txt output
npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 \
--interactive --device auto --batch-size 16 --output txt --dump vecs.txtHow it works:
| Event | What happens |
|---|---|
| Type a line, press Enter | Record is buffered |
Buffer reaches --batch-size |
Auto-flush: embed + append to output |
| Type an empty line | Manual flush: embed whatever is buffered |
| Ctrl+D (EOF) | Flush remaining records and exit |
| Ctrl+C | Flush remaining records and exit |
Behaviour notes:
- Progress messages (
Batch N: M record(s) → file) always go to stderr — they never pollute piped output. - When stdin is a TTY, a
>prompt is shown on stderr. - Output defaults to stdout if
--dumpis omitted; a tip is printed when running in TTY mode. --output jsonand--output sqlare automatically promoted tojsonlsince they produce complete documents that cannot be appended to incrementally.--output csvwrites the dimension header (text,dim_0,dim_1,...) on the first batch only; subsequent batches append data rows.- Each interactive session clears the
--dumpfile on start so you always get a fresh output file.
Configurable delimiter (-D / --delimiter)
By default records in stdin and files are split on newline (\n). Use --delimiter to change it:
# Newline-delimited (default)
printf 'Hello\nWorld\n' | npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2
# Null-byte delimited — safe with filenames/texts that contain newlines
printf 'Hello\0World\0' | npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 --delimiter '\0'
# Tab-delimited
printf 'Hello\tWorld' | npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 --delimiter '\t'
# Custom multi-character delimiter
printf 'Hello|||World|||Foo' | npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 --delimiter '|||'
# File with null-byte delimiter
npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 --file records.bin --delimiter '\0'
# Integrate with find -print0 (handles filenames with spaces / newlines)
find ./docs -name '*.txt' -print0 | \
xargs -0 cat | \
npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 --delimiter '\0'Supported escape sequences in --delimiter:
| Sequence | Character |
|---|---|
\0 |
Null byte (U+0000) |
\n |
Newline (U+000A) |
\t |
Tab (U+0009) |
\r |
Carriage return (U+000D) |
Output Formats
| Format | Description |
|---|---|
json (default) |
JSON array of float arrays: [[0.1,0.2,...],[...]] |
json --with-text |
JSON array of objects: [{"text":"...","embedding":[...]}] |
jsonl |
Newline-delimited JSON, one object per line: {"text":"...","embedding":[...]} |
csv |
CSV with header: text,dim_0,dim_1,...,dim_N |
txt |
Space-separated floats, one vector per line |
txt --with-text |
Tab-separated: <original text>\t<float float ...> |
sql |
INSERT INTO embeddings (text, vector) VALUES ...; |
Use --dump <path> to write the output to a file instead of stdout. Progress messages always go to stderr so they never interfere with piped output.
Piping examples
MODEL=Xenova/all-MiniLM-L6-v2
# --- json (default) ---
# Embed and pretty-print with jq
echo '["Hello","World"]' | npx @jsilvanus/embedeer --model $MODEL | jq '.[0] | length'
# --- jsonl ---
# One object per line — pipe to jq, grep, awk, etc.
npx @jsilvanus/embedeer --model $MODEL --data "foo" "bar" --output jsonl
# Filter by similarity: extract embedding for downstream processing
npx @jsilvanus/embedeer --model $MODEL --data "query text" --output jsonl \
| jq -c '.embedding'
# Stream a large file and store as JSONL
npx @jsilvanus/embedeer --model $MODEL --file big.txt --output jsonl --dump out.jsonl
# --- json --with-text ---
# Keep the source text next to each vector (useful for building a search index)
npx @jsilvanus/embedeer --model $MODEL --output json --with-text \
--data "cat" "dog" "fish" \
| jq '.[] | {text, dims: (.embedding | length)}'
# --- csv ---
# Embed then open in Python/pandas
npx @jsilvanus/embedeer --model $MODEL --file texts.txt --output csv --dump vectors.csv
python3 -c "import pandas as pd; df = pd.read_csv('vectors.csv'); print(df.shape)"
# --- txt ---
# Raw floats — useful for awk/paste/numpy text loading
npx @jsilvanus/embedeer --model $MODEL --data "Hello" "World" --output txt \
| awk '{print NF, "dimensions"}'
# txt --with-text: original text + tab + floats, easy to parse
npx @jsilvanus/embedeer --model $MODEL --file texts.txt --output txt --with-text \
| while IFS=$'\t' read -r text vec; do echo "TEXT: $text"; done
# --- sql ---
# Generate INSERT statements for a vector DB or SQLite
npx @jsilvanus/embedeer --model $MODEL --file texts.txt --output sql --dump inserts.sql
sqlite3 mydb.sqlite < inserts.sql
# --- Chaining with other tools ---
# Embed stdin from another command
cat docs/*.txt | npx @jsilvanus/embedeer --model $MODEL --output jsonl > embeddings.jsonl
# Null-byte input from find (handles any filename or text with newlines)
find ./corpus -name '*.txt' -print0 \
| xargs -0 cat \
| npx @jsilvanus/embedeer --model $MODEL --delimiter '\0' --output jsonlCLI Examples
# Pull a model (like ollama pull)
npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2
# Embed a few strings, output JSON (CPU)
npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 --data "Hello" "World"
# Auto-detect GPU, fall back to CPU if unavailable
# (uses CUDA on Linux, DirectML on Windows, CPU everywhere else)
npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 --device auto --data "Hello"
# Require GPU (throws with install instructions if no provider found)
npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 --device gpu --data "Hello GPU"
# Explicit CUDA (Linux x64 — requires CUDA 12 system libraries)
npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 --provider cuda --data "Hello CUDA"
# Explicit DirectML (Windows x64)
npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 --provider dml --data "Hello DML"
# Embed from a file, dump SQL to disk
npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 \
--file texts.txt --output sql --dump out.sql
# Use quantized model, in-process threads, private model with token
npx @jsilvanus/embedeer --model my-org/private-model \
--token hf_xxx --dtype q8 --mode thread \
--data "embed me"Using GPU
No additional packages are needed — onnxruntime-node (installed with @jsilvanus/embedeer) already
bundles the CUDA provider on Linux x64 and DirectML on Windows x64.
Linux x64 — NVIDIA CUDA:
# One-time: install CUDA 12 system libraries (Ubuntu/Debian)
sudo apt install cuda-toolkit-12-6 libcudnn9-cuda-12
# Auto-detect: uses CUDA here, CPU fallback on any other machine
npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 --device auto --data "Hello"
# Hard-require CUDA (throws with diagnostic error if unavailable):
npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 --device gpu --data "Hello GPU"
# Explicit CUDA provider:
npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 --provider cuda --data "Hello CUDA"Windows x64 — DirectML (any GPU: NVIDIA / AMD / Intel):
npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 --device auto --data "Hello"
npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 --device gpu --data "Hello GPU"
npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 --provider dml --data "Hello DML"GPU Acceleration
GPU support is built into onnxruntime-node (a dependency of @huggingface/transformers):
| Platform | Provider | Requirement |
|---|---|---|
| Linux x64 | CUDA | NVIDIA GPU + driver ≥ 525, CUDA 12 toolkit, cuDNN 9 |
| Windows x64 | DirectML | Any DirectX 12 GPU (most GPUs since 2016), Windows 10+ |
Provider selection logic
Testing
Run the project's tests locally:
# install deps
pnpm install
# run tests
pnpm test
# run tests with coverage
pnpm run coverageCI is enabled via GitHub Actions (.github/workflows/ci.yml) which runs tests and collects coverage on push and pull requests.
device |
provider |
Behavior |
|---|---|---|
cpu (default) |
— | Always CPU |
auto |
— | Try GPU providers for the platform in order; silent CPU fallback |
gpu |
— | Try GPU providers; throw if none available |
| any | cuda |
Load CUDA provider; throw if not available or not supported |
| any | dml |
Load DirectML provider; throw if not available or not supported |
| any | cpu |
Always CPU |
On Linux x64: GPU order is cuda.
On Windows x64: GPU order is cuda → dml.
Performance Optimizations
Embedeer exposes runtime knobs and helper scripts to tune throughput for your host.
- Pre-load models: run
Embedder.loadModel(model, { dtype, cacheDir })or use thebenchscripts so workers start instantly without re-downloading models. - Reuse
Embedderinstances: create a singleEmbedderand callembed()repeatedly instead of creating and destroying instances per batch. - Batch size vs concurrency:
- CPU: moderate batch sizes (16–64) with multiple workers (concurrency ≥ 2) usually give best throughput.
- GPU: larger batches (64–256) with low concurrency (1–2) are typically fastest.
- BLAS threading: avoid oversubscription by setting
OMP_NUM_THREADSandMKL_NUM_THREADStoMath.floor(cpu_cores / concurrency)before starting workers. - Device/provider: use
cudaon Linux anddml(DirectML) on Windows when available;device: 'auto'will try providers and fall back to CPU. - Automatic tuning: use
bench/grid-search.jsto sweepbatchSize,concurrency, anddtypefor your host and save results. You can generate and persist a per-user profile and apply it automatically via theEmbedderAPIs.
Examples:
# CPU quick grid
node bench/grid-search.js --device cpu --sample-size 200 --out bench/grid-results-cpu.json
# GPU quick grid
node bench/grid-search.js --device gpu --sample-size 100 --out bench/grid-results-gpu.jsonProgrammatic profile generation (writes ~/.embedeer/perf-profile.json):
import { Embedder } from '@jsilvanus/embedeer';
await Embedder.generateAndSaveProfile({ mode: 'quick', device: 'cpu', sampleSize: 100 });
// Embedder.create() will auto-apply a saved per-user profile by defaultHow it works
embed(texts)
│
├─ split into batches of batchSize
│
└─ Promise.all(batches) ──► WorkerPool
│
├─ [process mode] ChildProcessWorker 0
│ resolveProvider(device, provider)
│ → pipeline('feature-extraction', model, { device: 'cuda' })
│ → embed batch A
│
└─ [process mode] ChildProcessWorker 1
resolveProvider(device, provider)
→ pipeline(...) → embed batch BWorkers load the model once at startup and reuse it for all batches.
Provider activation happens per-worker before the pipeline is created.
E2E-testing
Note: HF authentication has not been tested.
Collaboration
You are welcome to suggest additions or open a PR, especially if you have performance-related assistance. Opened issues are also accepted with thanks.
License
MIT