eSpeak-NG WASM for Kokoro TTS text-to-phoneme conversion
Overview
@localmode/transformers provides model implementations for the interfaces defined in @localmode/core. It wraps HuggingFace Transformers.js to enable local ML inference in the browser.
Provider API
All models are created via the transformers provider object. Each factory method returns a model implementing a @localmode/core interface.
import{ rerank }from'@localmode/core';import{ transformers }from'@localmode/transformers';const rerankerModel = transformers.reranker('Xenova/ms-marco-MiniLM-L-6-v2');const{ results }=awaitrerank({
model: rerankerModel,
query:'What is machine learning?',
documents:['ML is a subset of AI...','Python is a language...'],
topK:5,});
import{ classify, extractEntities }from'@localmode/core';import{ transformers }from'@localmode/transformers';const sentiment =awaitclassify({
model: transformers.classifier('Xenova/distilbert-base-uncased-finetuned-sst-2-english'),
text:'I love this product!',});const entities =awaitextractEntities({
model: transformers.ner('Xenova/bert-base-NER'),
text:'John works at Microsoft in Seattle',});
Run ONNX-format language models in the browser with WebGPU acceleration:
import{ generateText, streamText }from'@localmode/core';import{ transformers }from'@localmode/transformers';const model = transformers.languageModel('onnx-community/Qwen3.5-0.8B-ONNX');// Single-shot generationconst{ text }=awaitgenerateText({ model, prompt:'What is 2+2?'});// Streaming generationconst result =awaitstreamText({ model, prompt:'Write a haiku'});forawait(const chunk of result.stream){
process.stdout.write(chunk.text);}
Method
Interface
Description
transformers.languageModel(modelId)
LanguageModel
Text generation (ONNX, WebGPU/WASM)
Recommended ONNX LLMs (16 curated models):
Model
Size
Context
Vision
onnx-community/granite-4.0-350m-ONNX-web
~120MB
4K
No
onnx-community/Qwen3-0.6B-ONNX
~570MB
4K
No
onnx-community/Qwen3.5-0.8B-ONNX
~500MB
32K
Yes
onnx-community/granite-4.0-1b-ONNX-web
~350MB
4K
No
onnx-community/Llama-3.2-1B-Instruct-ONNX
~380MB
8K
No
onnx-community/TinyLlama-1.1B-Chat-v1.0-ONNX
~350MB
2K
No
onnx-community/Qwen2.5-Coder-1.5B-Instruct
~450MB
4K
No
onnx-community/DeepSeek-R1-Distill-Qwen-1.5B-ONNX
~500MB
4K
No
onnx-community/Llama-3.2-3B-Instruct-ONNX
~900MB
8K
No
onnx-community/Qwen3-4B-ONNX
~1.2GB
4K
No
microsoft/Phi-3-mini-4k-instruct-onnx-web
~1.2GB
4K
No
onnx-community/Qwen3.5-2B-ONNX
~1.5GB
32K
Yes
onnx-community/gemma-4-E2B-it-ONNX
~1.5GB
128K
Yes
onnx-community/Phi-4-mini-instruct-web-q4f16
~2.3GB
4K
No
onnx-community/Qwen3.5-4B-ONNX
~2.5GB
32K
Yes
onnx-community/gemma-4-E4B-it-ONNX
~3GB
128K
Yes
Vision support: Qwen3.5, Qwen2.5-VL, Qwen3-VL, and Gemma 4 models support image input via their built-in vision encoder. Check model.supportsVision for feature detection. See Vision docs for usage.
Document-level OCR with table/formula recognition (~652MB)
onnx-community/LightOnOCR-2-1B-ONNX
Fast document OCR, 11 languages (~700MB)
Document QA
Model
Description
onnx-community/Florence-2-base-ft
Document QA (~223MB)
Xenova/donut-base-finetuned-docvqa
Donut (~218MB)
Model Constants
All recommended models are exported as constants for easy reference:
import{MODELS,// All models organized by taskEMBEDDING_MODELS,CLASSIFICATION_MODELS,ZERO_SHOT_MODELS,NER_MODELS,RERANKER_MODELS,SPEECH_TO_TEXT_MODELS,TEXT_TO_SPEECH_MODELS,IMAGE_CLASSIFICATION_MODELS,ZERO_SHOT_IMAGE_MODELS,IMAGE_CAPTION_MODELS,TRANSLATION_MODELS,SUMMARIZATION_MODELS,FILL_MASK_MODELS,QUESTION_ANSWERING_MODELS,OBJECT_DETECTION_MODELS,SEGMENTATION_MODELS,OCR_MODELS,DOCUMENT_QA_MODELS,IMAGE_TO_IMAGE_MODELS,IMAGE_FEATURE_MODELS,VAD_MODELS,TRANSFORMERS_LLM_MODELS,MULTIMODAL_EMBEDDING_MODELS,KOKORO_LANG_MAP,}from'@localmode/transformers';// Use with providerconst model = transformers.embedding(EMBEDDING_MODELS.BGE_SMALL_EN);
Kokoro Voice Catalog
The KOKORO_VOICES export provides a catalog of 29 English voices with metadata for UI display:
import{KOKORO_VOICES,KOKORO_DEFAULT_VOICE}from'@localmode/transformers';importtype{ KokoroVoice }from'@localmode/transformers';// Each voice has: id, name, language, languageLabel, genderconst english =KOKORO_VOICES.filter((v)=> v.language ==='en-US');const females =KOKORO_VOICES.filter((v)=> v.gender ==='female');console.log(KOKORO_DEFAULT_VOICE);// 'af_heart'
Languages: American English, British English.
Advanced Usage
Custom Model Options
const model = transformers.embedding('Xenova/bge-small-en-v1.5',{
quantized:true,// Use quantized model (smaller, faster)
device:'webgpu',// Use WebGPU for acceleration (falls back to WASM)});
Language Model Options
Language models accept additional settings via LanguageModelSettings:
const model = transformers.languageModel('onnx-community/Qwen3.5-0.8B-ONNX',{
contextLength:32768,
maxTokens:1024,
temperature:0.7,
device:'webgpu',// dtype accepts a string or a per-component config object
dtype:'q4f16',// For multimodal models, use per-component dtype:// dtype: { embed_tokens: 'q4', vision_encoder: 'q4', decoder_model_merged: 'q4' },});
Provider Options
Pass provider-specific options to core functions:
const{ embedding }=awaitembed({
model: transformers.embedding('Xenova/bge-small-en-v1.5'),
value:'Hello world',
providerOptions:{
transformers:{// Any Transformers.js specific options},},});