JSPM

  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 70
  • Score
    100M100P100Q94413F

Offline, in-browser hotword detection powered by EfficientWord-Net (ResNet-50 ArcFace). Works as a standalone app or npm library.

Package Exports

  • mellon

Readme

mellon

Offline, fully in-browser hotword / wake-word detection powered by EfficientWord-Net (ResNet-50 ArcFace).

  • 100% offline — ONNX inference runs in the browser via WebAssembly; no server, no cloud.
  • Speaker-independent — the model generalises across voices out of the box.
  • Custom words — enroll any phrase with ≥ 3 audio samples.
  • TypeScript-ready — ships with full .d.ts declarations.

Table of contents

  1. Browser requirements
  2. Installation
  3. Quick start
  4. Asset setup
  5. API reference
  6. Enrolling custom words
  7. Server / bundler configuration
  8. Browser support

Browser requirements

mellon uses ONNX Runtime's multi-threaded WebAssembly backend, which requires SharedArrayBuffer. This in turn requires the page to be served with the following HTTP headers:

Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp

See Server / bundler configuration for ready-to-use snippets.

Additionally:

  • The page must be served over HTTPS (or localhost).
  • Microphone permission is requested when start() is called.

Installation

npm install mellon

No asset setup required — the ONNX model and WASM runtime are loaded automatically from the jsDelivr CDN, pinned to the installed package version.


Quick start

import { Mellon } from 'mellon'

const stt = new Mellon({
  refs: [
    'https://example.com/hello_ref.json',
    'https://example.com/stop_ref.json',
  ],
})

await stt.start()   // fetches refs, then opens the mic

stt.addEventListener('match', (e) => {
  console.log(`Detected "${e.detail.name}" (${(e.detail.confidence * 100).toFixed(1)}%)`)
})

Refs are fetched automatically during start(). You can enroll your own words — see Enrolling custom words.


Asset setup (offline / intranet only)

By default, the WASM runtime and model load from the jsDelivr CDN — no setup needed. For air-gapped or private-network deployments, copy the assets locally and tell the library where to find them:

cp -r node_modules/mellon/dist/assets    public/mellon-assets/

Then pass the paths to the constructor:

new Mellon({
  assetsPath: '/mellon-assets',   // trailing slash required
})

For Vite projects add the copy step once in your config:

// vite.config.js
import { viteStaticCopy } from 'vite-plugin-static-copy'

export default {
  plugins: [
    viteStaticCopy({
      targets: [
        { src: 'node_modules/mellon/dist/assets/*', dest: 'mellon-assets' }
      ],
    }),
  ],
}

API reference

Mellon (high-level)

The easiest way to use the library. Wraps mic access, AudioWorklet wiring, and detector management into a single class.

class Mellon extends EventTarget {
  constructor(opts?: MellonOptions)
  readonly isInitialized: boolean
  readonly isRunning:     boolean

  init(onProgress?: (pct: number) => void): Promise<void>
  start(words?: string[]): Promise<void>
  stop(): void
  addCustomWord(refData: RefData): void
  enrollWord(wordName: string): EnrollmentSession

  static loadWords(): RefData[]
  static saveWord(refData: RefData): void
  static deleteWord(wordName: string): void
  static importWordFile(file: File): Promise<RefData>
  static exportWord(refData: RefData): void
}

MellonOptions

Option Type Default Description
refs (string | RefData)[] [] Refs to preload — URL strings are fetched during init()
words string[] [] Subset of loaded refs to activate (defaults to all loaded refs)
threshold number 0.65 Detection threshold (0–1)
relaxationMs number 2000 Min ms between match events
inferenceGapMs number 300 Min ms between inference runs
wasmBasePath string Base URL for ORT WASM (trailing /)
modelUrl string URL to model.onnx

Events

Event Detail type Fired when
ready init() completes
match { name, confidence, timestamp } A word is detected
error { error: Error } Model load or mic access fails

EnrollmentSession

Records audio samples from the mic (or uploaded files) and generates reference embeddings for a new custom word.

class EnrollmentSession extends EventTarget {
  constructor(wordName: string)

  readonly wordName:    string
  readonly sampleCount: number
  readonly samples:     { audioBuffer: Float32Array; name: string }[]

  recordSample():            Promise<number>   // → 1-based sample index
  addAudioFile(file: File):  Promise<number>   // → 1-based sample index
  removeSample(idx: number): void
  clearSamples():            void
  generateRef():             Promise<RefData>  // requires ≥ 3 samples
}

Events

Event Detail
recording-start
sample-added { count: number; name: string }
samples-changed { count: number }
generating { total: number }
progress { done: number; total: number }

Persistence

Static methods on the Mellon class for persisting and sharing custom word references.

Mellon.loadWords(): RefData[]                      // load from localStorage
Mellon.saveWord(refData: RefData): void             // save to localStorage
Mellon.deleteWord(wordName: string): void           // delete from localStorage
Mellon.importWordFile(file: File): Promise<RefData> // parse an uploaded JSON file
Mellon.exportWord(refData: RefData): void           // download as JSON file

RefData shape

interface RefData {
  word_name:  string           // e.g. 'hello'
  model_type: 'resnet_50_arc'
  embeddings: number[][]       // N × 256 vectors
}

Compatible with the EfficientWord-Net _ref.json format — you can import reference files generated by the Python toolkit directly.


Enrolling custom words

import { Mellon } from 'mellon'

const stt = new Mellon()
await stt.init()

// 1. Create an enrollment session
const session = stt.enrollWord('hey computer')

session.addEventListener('recording-start', () => console.log('Recording…'))
session.addEventListener('sample-added', e => console.log(`Sample ${e.detail.count} recorded`))

// 2. Record at least 3 samples (1.5 s each)
await session.recordSample()
await session.recordSample()
await session.recordSample()

// 3. Generate reference embeddings
session.addEventListener('progress', e => console.log(`Embedding ${e.detail.done}/${e.detail.total}`))
const ref = await session.generateRef()

// 4a. Use immediately in the running detector
stt.addCustomWord(ref)

// 4b. Persist for future sessions
Mellon.saveWord(ref)

You can also enroll from pre-recorded audio files:

const file = document.querySelector('input[type=file]').files[0]
await session.addAudioFile(file)

Server / bundler configuration

SharedArrayBuffer (required by multi-threaded WASM) is only available when the page is served with:

Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp

Vite dev server

Already configured in the demo app's vite.config.js. For your own project:

// vite.config.js
export default {
  server:  { headers: { 'Cross-Origin-Opener-Policy': 'same-origin', 'Cross-Origin-Embedder-Policy': 'require-corp' } },
  preview: { headers: { 'Cross-Origin-Opener-Policy': 'same-origin', 'Cross-Origin-Embedder-Policy': 'require-corp' } },
}

Express

app.use((req, res, next) => {
  res.setHeader('Cross-Origin-Opener-Policy',  'same-origin')
  res.setHeader('Cross-Origin-Embedder-Policy', 'require-corp')
  next()
})

Nginx

add_header Cross-Origin-Opener-Policy  "same-origin";
add_header Cross-Origin-Embedder-Policy "require-corp";

Netlify (public/_headers)

/*
  Cross-Origin-Opener-Policy: same-origin
  Cross-Origin-Embedder-Policy: require-corp

Browser support

Browser Supported Notes
Chrome / Edge 89+ Full support
Firefox 79+ Full support
Safari 15.2+ SharedArrayBuffer re-enabled with COOP/COEP
Safari < 15.2 SharedArrayBuffer not available
iOS Safari 15.2+ Works over HTTPS
Node.js Browser-only (AudioContext, getUserMedia)

License

MIT