HuggingFace Transformers.js provider for LocalMode AI Engine - run ML models locally in the browser.
Features
🚀 Browser-Native - Run ML models directly in the browser with WebGPU/WASM
🔒 Privacy-First - All processing happens locally, no data leaves the device
📦 Model Caching - Models are cached in IndexedDB for instant subsequent loads
⚡ Optimized - Uses quantized models for smaller size and faster inference
Installation
pnpm install @localmode/transformers @localmode/core @xenova/transformers
npm install @localmode/transformers @localmode/core @xenova/transformers Quick Startimport { transformers } from '@localmode/transformers' ;
import {
classify,
extractEntities,
transcribe,
classifyImage,
captionImage,
} from '@localmode/core' ;
const sentiment = await classify ( {
model: transformers. classifier ( 'Xenova/distilbert-base-uncased-finetuned-sst-2-english' ) ,
text: 'I love this product!' ,
} ) ;
console . log ( sentiment. label) ;
const entities = await extractEntities ( {
model: transformers. ner ( 'Xenova/bert-base-NER' ) ,
text: 'John works at Microsoft in Seattle' ,
} ) ;
console . log ( entities. entities) ;
const transcription = await transcribe ( {
model: transformers. speechToText ( 'Xenova/whisper-tiny' ) ,
audio: audioBlob,
returnTimestamps: true ,
} ) ;
console . log ( transcription. text) ;
const classification = await classifyImage ( {
model: transformers. imageClassifier ( 'Xenova/vit-base-patch16-224' ) ,
image: imageBlob,
} ) ;
console . log ( classification. predictions) ;
const caption = await captionImage ( {
model: transformers. captioner ( 'Xenova/blip-image-captioning-base' ) ,
image: imageBlob,
} ) ;
console . log ( caption. caption) ; Available Model Types Text/NLP Models (P1)
Method
Interface
Description
transformers.classifier(modelId)
ClassificationModel
Text classification
transformers.zeroShotClassifier(modelId)
ZeroShotClassificationModel
Zero-shot text classification
transformers.ner(modelId)
NERModel
Named Entity Recognition
transformers.reranker(modelId)
RerankerModel
Document reranking
transformers.embedding(modelId)
EmbeddingModel
Text embeddings
Text/NLP Models (P2)
Method
Interface
Description
transformers.translator(modelId)
TranslationModel
Text translation
transformers.summarizer(modelId)
SummarizationModel
Text summarization
transformers.fillMask(modelId)
FillMaskModel
Masked token prediction
transformers.questionAnswering(modelId)
QuestionAnsweringModel
Extractive QA
Vision Models (P1)
Method
Interface
Description
transformers.imageClassifier(modelId)
ImageClassificationModel
Image classification
transformers.zeroShotImageClassifier(modelId)
ZeroShotImageClassificationModel
Zero-shot image classification
transformers.captioner(modelId)
ImageCaptionModel
Image captioning
Vision Models (P2)
Method
Interface
Description
transformers.segmenter(modelId)
SegmentationModel
Image segmentation
transformers.objectDetector(modelId)
ObjectDetectionModel
Object detection
transformers.imageFeatures(modelId)
ImageFeatureModel
Image feature extraction
transformers.imageToImage(modelId)
ImageToImageModel
Image transformation / super resolution
transformers.ocr(modelId)
OCRModel
OCR (TrOCR)
transformers.documentQA(modelId)
DocumentQAModel
Document/Table question answering
Audio Models (P1 & P2)
Method
Interface
Description
transformers.speechToText(modelId)
SpeechToTextModel
Speech-to-text transcription
transformers.textToSpeech(modelId)
TextToSpeechModel
Text-to-speech synthesis (P2)
Recommended Models Text Classification
Xenova/distilbert-base-uncased-finetuned-sst-2-english - Sentiment analysis
Xenova/twitter-roberta-base-sentiment-latest - Twitter sentiment
Named Entity Recognition
Xenova/bert-base-NER - Standard NER (PER, ORG, LOC, MISC)
Reranking
Xenova/ms-marco-MiniLM-L-6-v2 - Document reranking for RAG
Translation (P2)
Xenova/opus-mt-en-de - English to German
Xenova/opus-mt-en-fr - English to French
Xenova/nllb-200-distilled-600M - 200 languages
Summarization (P2)
Xenova/bart-large-cnn - News summarization
Xenova/distilbart-cnn-12-6 - Fast summarization
Fill-Mask (P2)
Xenova/bert-base-uncased - General purpose
Xenova/roberta-base - Better for some tasks
Question Answering (P2)
Xenova/distilbert-base-cased-distilled-squad - SQuAD trained
Xenova/roberta-base-squad2 - SQuAD 2.0 trained
Speech-to-Text
Xenova/whisper-tiny - Fast, smaller size (~70MB)
Xenova/whisper-small - Better accuracy (~240MB)
Text-to-Speech (P2)
Xenova/speecht5_tts - Natural speech synthesis
Image Classification
Xenova/vit-base-patch16-224 - General image classification
Xenova/clip-vit-base-patch32 - Zero-shot image classification
Image Captioning
Xenova/blip-image-captioning-base - High-quality captions
Image Segmentation (P2)
Xenova/segformer-b0-finetuned-ade-512-512 - Fast segmentation
Xenova/detr-resnet-50-panoptic - Panoptic segmentation
Object Detection (P2)
Xenova/detr-resnet-50 - COCO objects
Xenova/yolos-tiny - Fast detection
Image Features (P2)
Xenova/clip-vit-base-patch32 - Image embeddings
Xenova/dinov2-small - Self-supervised features
OCR (P2)
Xenova/trocr-base-handwritten - Handwritten text
Xenova/trocr-base-printed - Printed text
Document QA (P2)
Xenova/donut-base-finetuned-docvqa - Document QA
Xenova/tapas-base-finetuned-wtq - Table QA
Embeddings
Xenova/all-MiniLM-L6-v2 - Fast, general-purpose (~22MB)
Xenova/paraphrase-multilingual-MiniLM-L12-v2 - 50+ languages
P2 Feature Examples Translationimport { translate } from '@localmode/core' ;
import { transformers } from '@localmode/transformers' ;
const { translatedText } = await translate ( {
model: transformers. translator ( 'Xenova/opus-mt-en-de' ) ,
text: 'Hello world!' ,
targetLanguage: 'de' ,
} ) ;
console . log ( translatedText) ; Summarizationimport { summarize } from '@localmode/core' ;
import { transformers } from '@localmode/transformers' ;
const { summary } = await summarize ( {
model: transformers. summarizer ( 'Xenova/bart-large-cnn' ) ,
text: longArticle,
maxLength: 100 ,
} ) ; Image Segmentationimport { segmentImage } from '@localmode/core' ;
import { transformers } from '@localmode/transformers' ;
const { masks } = await segmentImage ( {
model: transformers. segmenter ( 'Xenova/segformer-b0-finetuned-ade-512-512' ) ,
image: imageBlob,
} ) ;
for ( const mask of masks) {
console . log ( mask. label, mask. score) ;
} Object Detectionimport { detectObjects } from '@localmode/core' ;
import { transformers } from '@localmode/transformers' ;
const { objects } = await detectObjects ( {
model: transformers. objectDetector ( 'Xenova/detr-resnet-50' ) ,
image: imageBlob,
threshold: 0.5 ,
} ) ;
for ( const obj of objects) {
console . log ( ` ${ obj. label} : ${ obj. box. x} , ${ obj. box. y} ` ) ;
} Text-to-Speechimport { synthesizeSpeech } from '@localmode/core' ;
import { transformers } from '@localmode/transformers' ;
const { audio, sampleRate } = await synthesizeSpeech ( {
model: transformers. textToSpeech ( 'Xenova/speecht5_tts' ) ,
text: 'Hello, how are you?' ,
} ) ;
const ctx = new AudioContext ( ) ;
const buffer = ctx. createBuffer ( 1 , audio. length, sampleRate) ;
buffer. getChannelData ( 0 ) . set ( audio) ;
const source = ctx. createBufferSource ( ) ;
source. buffer = buffer;
source. connect ( ctx. destination) ;
source. start ( ) ; OCRimport { extractText } from '@localmode/core' ;
import { transformers } from '@localmode/transformers' ;
const { text, regions } = await extractText ( {
model: transformers. ocr ( 'Xenova/trocr-base-printed' ) ,
image: documentImage,
} ) ;
console . log ( text) ; Question Answeringimport { answerQuestion } from '@localmode/core' ;
import { transformers } from '@localmode/transformers' ;
const { answers } = await answerQuestion ( {
model: transformers. questionAnswering ( 'Xenova/distilbert-base-cased-distilled-squad' ) ,
question: 'What is the capital of France?' ,
context: 'Paris is the capital and largest city of France.' ,
} ) ;
console . log ( answers[ 0 ] . answer) ; Advanced Usage Custom Model Optionsconst model = transformers. classifier ( 'Xenova/distilbert-base-uncased-finetuned-sst-2-english' , {
quantized: true ,
revision: 'main' ,
} ) ; Provider OptionsPass provider-specific options to core functions:
const result = await classify ( {
model: transformers. classifier ( 'Xenova/model' ) ,
text: 'Hello world' ,
providerOptions: {
transformers: {
} ,
} ,
} ) ; Preloading ModelsFor better UX, preload models before use:
const classifier = transformers. classifier (
'Xenova/distilbert-base-uncased-finetuned-sst-2-english'
) ;
const result = await classify ( { model: classifier, text: 'Hello' } ) ;
const result2 = await classify ( { model: classifier, text: 'World' } ) ; Browser Compatibility
Browser
WebGPU
WASM
Notes
Chrome 113+
✅
✅
Best performance with WebGPU
Edge 113+
✅
✅
Same as Chrome
Firefox
❌
✅
WASM only
Safari 18+
✅
✅
WebGPU available
iOS Safari
❌
✅
WASM only
Use quantized models - Smaller and faster with minimal quality loss
Preload models - Load during app init for instant inference
Use WebGPU when available - 3-5x faster than WASM
Batch operations - Process multiple inputs together
LicenseMIT