@localmode/transformers provides model implementations for the interfaces defined in @localmode/core. It wraps HuggingFace Transformers.js to enable local ML inference in the browser.
Provider API
All models are created via the transformers provider object. Each factory method returns a model implementing a @localmode/core interface.
import{ rerank }from'@localmode/core';import{ transformers }from'@localmode/transformers';const rerankerModel = transformers.reranker('Xenova/ms-marco-MiniLM-L-6-v2');const{ results }=awaitrerank({
model: rerankerModel,
query:'What is machine learning?',
documents:['ML is a subset of AI...','Python is a language...'],
topK:5,});
import{ classify, extractEntities }from'@localmode/core';import{ transformers }from'@localmode/transformers';const sentiment =awaitclassify({
model: transformers.classifier('Xenova/distilbert-base-uncased-finetuned-sst-2-english'),
text:'I love this product!',});const entities =awaitextractEntities({
model: transformers.ner('Xenova/bert-base-NER'),
text:'John works at Microsoft in Seattle',});
Experimental: Uses Transformers.js v4 (preview release). The API may change.
Run ONNX-format language models in the browser with WebGPU acceleration:
import{ generateText, streamText }from'@localmode/core';import{ transformers }from'@localmode/transformers';const model = transformers.languageModel('onnx-community/Qwen3.5-0.8B-ONNX');// Single-shot generationconst{ text }=awaitgenerateText({ model, prompt:'What is 2+2?'});// Streaming generationconst result =awaitstreamText({ model, prompt:'Write a haiku'});forawait(const chunk of result.stream){
process.stdout.write(chunk.text);}
Method
Interface
Description
transformers.languageModel(modelId)
LanguageModel
Text generation (ONNX, WebGPU/WASM)
Recommended ONNX LLMs:
Model
Size
Context
Vision
onnx-community/Qwen3.5-0.8B-ONNX
~500MB
32K
Yes
onnx-community/Qwen3.5-2B-ONNX
~1.5GB
32K
Yes
onnx-community/Qwen3.5-4B-ONNX
~2.5GB
32K
Yes
onnx-community/SmolLM2-360M-Instruct
~200MB
2K
No
onnx-community/SmolLM2-135M-Instruct
~80MB
2K
No
Vision support: Qwen3.5 models support image input via their built-in vision encoder. Check model.supportsVision for feature detection. See Vision docs for usage.
All recommended models are exported as constants for easy reference:
import{MODELS,// All models organized by taskEMBEDDING_MODELS,CLASSIFICATION_MODELS,ZERO_SHOT_MODELS,NER_MODELS,RERANKER_MODELS,SPEECH_TO_TEXT_MODELS,TEXT_TO_SPEECH_MODELS,IMAGE_CLASSIFICATION_MODELS,ZERO_SHOT_IMAGE_MODELS,IMAGE_CAPTION_MODELS,TRANSLATION_MODELS,SUMMARIZATION_MODELS,FILL_MASK_MODELS,QUESTION_ANSWERING_MODELS,OBJECT_DETECTION_MODELS,SEGMENTATION_MODELS,OCR_MODELS,DOCUMENT_QA_MODELS,IMAGE_TO_IMAGE_MODELS,IMAGE_FEATURE_MODELS,}from'@localmode/transformers';// Use with providerconst model = transformers.embedding(EMBEDDING_MODELS.BGE_SMALL_EN);
Advanced Usage
Custom Model Options
const model = transformers.embedding('Xenova/bge-small-en-v1.5',{
quantized:true,// Use quantized model (smaller, faster)
device:'webgpu',// Use WebGPU for acceleration (falls back to WASM)});
Provider Options
Pass provider-specific options to core functions:
const{ embedding }=awaitembed({
model: transformers.embedding('Xenova/bge-small-en-v1.5'),
value:'Hello world',
providerOptions:{
transformers:{// Any Transformers.js specific options},},});