Package Exports
- mellon
Readme
mellon
Offline, fully in-browser hotword / wake-word detection powered by EfficientWord-Net (ResNet-50 ArcFace). Works as a zero-dependency npm library or as a standalone PWA.
- 100% offline — ONNX inference runs in the browser via WebAssembly; no server, no cloud.
- Speaker-independent — the model generalises across voices out of the box.
- Custom words — enroll any phrase with ≥ 3 audio samples; no retraining.
- TypeScript-ready — ships with full
.d.tsdeclarations. - Tiny API surface — one class, zero config.
Table of contents
- Browser requirements
- Installation
- Quick start
- Asset setup
- API reference
- Enrolling custom words
- Server / bundler configuration
- Browser support
Browser requirements
mellon uses ONNX Runtime's multi-threaded WebAssembly backend, which requires SharedArrayBuffer. This in turn requires the page to be served with the following HTTP headers:
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corpSee Server / bundler configuration for ready-to-use snippets.
Additionally:
- The page must be served over HTTPS (or
localhost). - Microphone permission is requested when
start()is called.
Installation
npm install mellonNo asset setup required — the ONNX model and WASM runtime are loaded automatically from the jsDelivr CDN, pinned to the installed package version.
Quick start
import { Mellon } from 'mellon'
const stt = new Mellon({
refs: [
'https://example.com/hello_ref.json',
'https://example.com/stop_ref.json',
],
})
await stt.start() // fetches refs, then opens the mic
stt.addEventListener('match', (e) => {
console.log(`Detected "${e.detail.name}" (${(e.detail.confidence * 100).toFixed(1)}%)`)
})Refs are fetched automatically during start(). You can enroll your own words — see Enrolling custom words.
Asset setup (offline / intranet only)
By default, the WASM runtime and model load from the jsDelivr CDN — no setup needed. For air-gapped or private-network deployments, copy the assets locally and tell the library where to find them:
cp -r node_modules/mellon/dist/wasm public/mellon-assets/wasm
cp node_modules/mellon/dist/models/model.onnx public/mellon-assets/model.onnxThen pass the paths to the constructor:
new Mellon({
wasmBasePath: '/mellon-assets/wasm/', // trailing slash required
modelUrl: '/mellon-assets/model.onnx',
})For Vite projects add the copy step once in your config:
// vite.config.js
import { viteStaticCopy } from 'vite-plugin-static-copy'
export default {
plugins: [
viteStaticCopy({
targets: [
{ src: 'node_modules/mellon/dist/wasm/*', dest: 'mellon-assets/wasm' },
{ src: 'node_modules/mellon/dist/models/model.onnx', dest: 'mellon-assets' },
],
}),
],
}API reference
Mellon (high-level)
The easiest way to use the library. Wraps mic access, AudioWorklet wiring, and detector management into a single class.
class Mellon extends EventTarget {
constructor(opts?: MellonOptions)
readonly isInitialized: boolean
readonly isRunning: boolean
init(onProgress?: (pct: number) => void): Promise<void>
start(words?: string[]): Promise<void>
stop(): void
addCustomWord(refData: RefData): void
enrollWord(wordName: string): EnrollmentSession
static loadWords(): RefData[]
static saveWord(refData: RefData): void
static deleteWord(wordName: string): void
static importWordFile(file: File): Promise<RefData>
static exportWord(refData: RefData): void
}MellonOptions
| Option | Type | Default | Description |
|---|---|---|---|
refs |
(string | RefData)[] |
[] |
Refs to preload — URL strings are fetched during init() |
words |
string[] |
[] |
Subset of loaded refs to activate (defaults to all loaded refs) |
threshold |
number |
0.65 |
Detection threshold (0–1) |
relaxationMs |
number |
2000 |
Min ms between match events |
inferenceGapMs |
number |
300 |
Min ms between inference runs |
wasmBasePath |
string |
— | Base URL for ORT WASM (trailing /) |
modelUrl |
string |
— | URL to model.onnx |
Events
| Event | Detail type | Fired when |
|---|---|---|
ready |
— | init() completes |
match |
{ name, confidence, timestamp } |
A word is detected |
error |
{ error: Error } |
Model load or mic access fails |
EnrollmentSession
Records audio samples from the mic (or uploaded files) and generates reference embeddings for a new custom word.
class EnrollmentSession extends EventTarget {
constructor(wordName: string)
readonly wordName: string
readonly sampleCount: number
readonly samples: { audioBuffer: Float32Array; name: string }[]
recordSample(): Promise<number> // → 1-based sample index
addAudioFile(file: File): Promise<number> // → 1-based sample index
removeSample(idx: number): void
clearSamples(): void
generateRef(): Promise<RefData> // requires ≥ 3 samples
}Events
| Event | Detail |
|---|---|
recording-start |
— |
sample-added |
{ count: number; name: string } |
samples-changed |
{ count: number } |
generating |
{ total: number } |
progress |
{ done: number; total: number } |
Persistence
Static methods on the Mellon class for persisting and sharing custom word references.
Mellon.loadWords(): RefData[] // load from localStorage
Mellon.saveWord(refData: RefData): void // save to localStorage
Mellon.deleteWord(wordName: string): void // delete from localStorage
Mellon.importWordFile(file: File): Promise<RefData> // parse an uploaded JSON file
Mellon.exportWord(refData: RefData): void // download as JSON fileRefData shape
interface RefData {
word_name: string // e.g. 'hello'
model_type: 'resnet_50_arc'
embeddings: number[][] // N × 256 vectors
}Compatible with the EfficientWord-Net _ref.json format — you can import reference files generated by the Python toolkit directly.
Enrolling custom words
import { Mellon } from 'mellon'
const stt = new Mellon()
await stt.init()
// 1. Create an enrollment session
const session = stt.enrollWord('hey computer')
session.addEventListener('recording-start', () => console.log('Recording…'))
session.addEventListener('sample-added', e => console.log(`Sample ${e.detail.count} recorded`))
// 2. Record at least 3 samples (1.5 s each)
await session.recordSample()
await session.recordSample()
await session.recordSample()
// 3. Generate reference embeddings
session.addEventListener('progress', e => console.log(`Embedding ${e.detail.done}/${e.detail.total}`))
const ref = await session.generateRef()
// 4a. Use immediately in the running detector
stt.addCustomWord(ref)
// 4b. Persist for future sessions
Mellon.saveWord(ref)You can also enroll from pre-recorded audio files:
const file = document.querySelector('input[type=file]').files[0]
await session.addAudioFile(file)Server / bundler configuration
SharedArrayBuffer (required by multi-threaded WASM) is only available when the page is served with:
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corpVite dev server
Already configured in the demo app's vite.config.js. For your own project:
// vite.config.js
export default {
server: { headers: { 'Cross-Origin-Opener-Policy': 'same-origin', 'Cross-Origin-Embedder-Policy': 'require-corp' } },
preview: { headers: { 'Cross-Origin-Opener-Policy': 'same-origin', 'Cross-Origin-Embedder-Policy': 'require-corp' } },
}Express
app.use((req, res, next) => {
res.setHeader('Cross-Origin-Opener-Policy', 'same-origin')
res.setHeader('Cross-Origin-Embedder-Policy', 'require-corp')
next()
})Nginx
add_header Cross-Origin-Opener-Policy "same-origin";
add_header Cross-Origin-Embedder-Policy "require-corp";Netlify (public/_headers)
/*
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corpBrowser support
| Browser | Supported | Notes |
|---|---|---|
| Chrome / Edge 89+ | ✅ | Full support |
| Firefox 79+ | ✅ | Full support |
| Safari 15.2+ | ✅ | SharedArrayBuffer re-enabled with COOP/COEP |
| Safari < 15.2 | ❌ | SharedArrayBuffer not available |
| iOS Safari 15.2+ | ✅ | Works over HTTPS |
| Node.js | ❌ | Browser-only (AudioContext, getUserMedia) |
License
MIT