JSPM

@localmode/transformers

1.0.0
  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 24
  • Score
    100M100P100Q42902F
  • License MIT

Transformers.js provider for @localmode - implements all ML model interfaces

Package Exports

  • @localmode/transformers

Readme

@localmode/transformers

HuggingFace Transformers.js provider for LocalMode AI Engine - run ML models locally in the browser.

npm license

Features

  • 🚀 Browser-Native - Run ML models directly in the browser with WebGPU/WASM
  • 🔒 Privacy-First - All processing happens locally, no data leaves the device
  • 📦 Model Caching - Models are cached in IndexedDB for instant subsequent loads
  • Optimized - Uses quantized models for smaller size and faster inference

Installation

# Preferred: pnpm
pnpm install @localmode/transformers @localmode/core @xenova/transformers

# Alternative: npm
npm install @localmode/transformers @localmode/core @xenova/transformers

Quick Start

import { transformers } from '@localmode/transformers';
import {
  classify,
  extractEntities,
  transcribe,
  classifyImage,
  captionImage,
} from '@localmode/core';

// Text Classification
const sentiment = await classify({
  model: transformers.classifier('Xenova/distilbert-base-uncased-finetuned-sst-2-english'),
  text: 'I love this product!',
});
console.log(sentiment.label); // 'POSITIVE'

// Named Entity Recognition
const entities = await extractEntities({
  model: transformers.ner('Xenova/bert-base-NER'),
  text: 'John works at Microsoft in Seattle',
});
console.log(entities.entities);
// [{ entity: 'John', type: 'PER', ... }, { entity: 'Microsoft', type: 'ORG', ... }, ...]

// Speech-to-Text
const transcription = await transcribe({
  model: transformers.speechToText('Xenova/whisper-tiny'),
  audio: audioBlob,
  returnTimestamps: true,
});
console.log(transcription.text);

// Image Classification
const classification = await classifyImage({
  model: transformers.imageClassifier('Xenova/vit-base-patch16-224'),
  image: imageBlob,
});
console.log(classification.predictions);

// Image Captioning
const caption = await captionImage({
  model: transformers.captioner('Xenova/blip-image-captioning-base'),
  image: imageBlob,
});
console.log(caption.caption);

Available Model Types

Text/NLP Models (P1)

Method Interface Description
transformers.classifier(modelId) ClassificationModel Text classification
transformers.zeroShotClassifier(modelId) ZeroShotClassificationModel Zero-shot text classification
transformers.ner(modelId) NERModel Named Entity Recognition
transformers.reranker(modelId) RerankerModel Document reranking
transformers.embedding(modelId) EmbeddingModel Text embeddings

Text/NLP Models (P2)

Method Interface Description
transformers.translator(modelId) TranslationModel Text translation
transformers.summarizer(modelId) SummarizationModel Text summarization
transformers.fillMask(modelId) FillMaskModel Masked token prediction
transformers.questionAnswering(modelId) QuestionAnsweringModel Extractive QA

Vision Models (P1)

Method Interface Description
transformers.imageClassifier(modelId) ImageClassificationModel Image classification
transformers.zeroShotImageClassifier(modelId) ZeroShotImageClassificationModel Zero-shot image classification
transformers.captioner(modelId) ImageCaptionModel Image captioning

Vision Models (P2)

Method Interface Description
transformers.segmenter(modelId) SegmentationModel Image segmentation
transformers.objectDetector(modelId) ObjectDetectionModel Object detection
transformers.imageFeatures(modelId) ImageFeatureModel Image feature extraction
transformers.imageToImage(modelId) ImageToImageModel Image transformation / super resolution
transformers.ocr(modelId) OCRModel OCR (TrOCR)
transformers.documentQA(modelId) DocumentQAModel Document/Table question answering

Audio Models (P1 & P2)

Method Interface Description
transformers.speechToText(modelId) SpeechToTextModel Speech-to-text transcription
transformers.textToSpeech(modelId) TextToSpeechModel Text-to-speech synthesis (P2)

Text Classification

  • Xenova/distilbert-base-uncased-finetuned-sst-2-english - Sentiment analysis
  • Xenova/twitter-roberta-base-sentiment-latest - Twitter sentiment

Named Entity Recognition

  • Xenova/bert-base-NER - Standard NER (PER, ORG, LOC, MISC)

Reranking

  • Xenova/ms-marco-MiniLM-L-6-v2 - Document reranking for RAG

Translation (P2)

  • Xenova/opus-mt-en-de - English to German
  • Xenova/opus-mt-en-fr - English to French
  • Xenova/nllb-200-distilled-600M - 200 languages

Summarization (P2)

  • Xenova/bart-large-cnn - News summarization
  • Xenova/distilbart-cnn-12-6 - Fast summarization

Fill-Mask (P2)

  • Xenova/bert-base-uncased - General purpose
  • Xenova/roberta-base - Better for some tasks

Question Answering (P2)

  • Xenova/distilbert-base-cased-distilled-squad - SQuAD trained
  • Xenova/roberta-base-squad2 - SQuAD 2.0 trained

Speech-to-Text

  • Xenova/whisper-tiny - Fast, smaller size (~70MB)
  • Xenova/whisper-small - Better accuracy (~240MB)

Text-to-Speech (P2)

  • Xenova/speecht5_tts - Natural speech synthesis

Image Classification

  • Xenova/vit-base-patch16-224 - General image classification
  • Xenova/clip-vit-base-patch32 - Zero-shot image classification

Image Captioning

  • Xenova/blip-image-captioning-base - High-quality captions

Image Segmentation (P2)

  • Xenova/segformer-b0-finetuned-ade-512-512 - Fast segmentation
  • Xenova/detr-resnet-50-panoptic - Panoptic segmentation

Object Detection (P2)

  • Xenova/detr-resnet-50 - COCO objects
  • Xenova/yolos-tiny - Fast detection

Image Features (P2)

  • Xenova/clip-vit-base-patch32 - Image embeddings
  • Xenova/dinov2-small - Self-supervised features

OCR (P2)

  • Xenova/trocr-base-handwritten - Handwritten text
  • Xenova/trocr-base-printed - Printed text

Document QA (P2)

  • Xenova/donut-base-finetuned-docvqa - Document QA
  • Xenova/tapas-base-finetuned-wtq - Table QA

Embeddings

  • Xenova/all-MiniLM-L6-v2 - Fast, general-purpose (~22MB)
  • Xenova/paraphrase-multilingual-MiniLM-L12-v2 - 50+ languages

P2 Feature Examples

Translation

import { translate } from '@localmode/core';
import { transformers } from '@localmode/transformers';

const { translatedText } = await translate({
  model: transformers.translator('Xenova/opus-mt-en-de'),
  text: 'Hello world!',
  targetLanguage: 'de',
});
console.log(translatedText); // "Hallo Welt!"

Summarization

import { summarize } from '@localmode/core';
import { transformers } from '@localmode/transformers';

const { summary } = await summarize({
  model: transformers.summarizer('Xenova/bart-large-cnn'),
  text: longArticle,
  maxLength: 100,
});

Image Segmentation

import { segmentImage } from '@localmode/core';
import { transformers } from '@localmode/transformers';

const { masks } = await segmentImage({
  model: transformers.segmenter('Xenova/segformer-b0-finetuned-ade-512-512'),
  image: imageBlob,
});

for (const mask of masks) {
  console.log(mask.label, mask.score);
}

Object Detection

import { detectObjects } from '@localmode/core';
import { transformers } from '@localmode/transformers';

const { objects } = await detectObjects({
  model: transformers.objectDetector('Xenova/detr-resnet-50'),
  image: imageBlob,
  threshold: 0.5,
});

for (const obj of objects) {
  console.log(`${obj.label}: ${obj.box.x},${obj.box.y}`);
}

Text-to-Speech

import { synthesizeSpeech } from '@localmode/core';
import { transformers } from '@localmode/transformers';

const { audio, sampleRate } = await synthesizeSpeech({
  model: transformers.textToSpeech('Xenova/speecht5_tts'),
  text: 'Hello, how are you?',
});

// Play with Web Audio API
const ctx = new AudioContext();
const buffer = ctx.createBuffer(1, audio.length, sampleRate);
buffer.getChannelData(0).set(audio);
const source = ctx.createBufferSource();
source.buffer = buffer;
source.connect(ctx.destination);
source.start();

OCR

import { extractText } from '@localmode/core';
import { transformers } from '@localmode/transformers';

const { text, regions } = await extractText({
  model: transformers.ocr('Xenova/trocr-base-printed'),
  image: documentImage,
});
console.log(text);

Question Answering

import { answerQuestion } from '@localmode/core';
import { transformers } from '@localmode/transformers';

const { answers } = await answerQuestion({
  model: transformers.questionAnswering('Xenova/distilbert-base-cased-distilled-squad'),
  question: 'What is the capital of France?',
  context: 'Paris is the capital and largest city of France.',
});
console.log(answers[0].answer); // "Paris"

Advanced Usage

Custom Model Options

const model = transformers.classifier('Xenova/distilbert-base-uncased-finetuned-sst-2-english', {
  quantized: true, // Use quantized model (smaller, faster)
  revision: 'main', // Model revision
});

Provider Options

Pass provider-specific options to core functions:

const result = await classify({
  model: transformers.classifier('Xenova/model'),
  text: 'Hello world',
  providerOptions: {
    transformers: {
      // Any Transformers.js specific options
    },
  },
});

Preloading Models

For better UX, preload models before use:

// Models are automatically cached after first load
const classifier = transformers.classifier(
  'Xenova/distilbert-base-uncased-finetuned-sst-2-english'
);

// First call downloads the model
const result = await classify({ model: classifier, text: 'Hello' });

// Subsequent calls are instant (loaded from cache)
const result2 = await classify({ model: classifier, text: 'World' });

Browser Compatibility

Browser WebGPU WASM Notes
Chrome 113+ Best performance with WebGPU
Edge 113+ Same as Chrome
Firefox WASM only
Safari 18+ WebGPU available
iOS Safari WASM only

Performance Tips

  1. Use quantized models - Smaller and faster with minimal quality loss
  2. Preload models - Load during app init for instant inference
  3. Use WebGPU when available - 3-5x faster than WASM
  4. Batch operations - Process multiple inputs together

License

MIT