JSPM

  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 848
  • Score
    100M100P100Q86958F

Recognition Service TypeScript/Node.js Client SDK

Package Exports

  • @volley/recognition-client-sdk
  • @volley/recognition-client-sdk/browser

Readme

@volley/recognition-client-sdk

TypeScript SDK for real-time speech recognition via WebSocket.

Installation

npm install @volley/recognition-client-sdk

Quick Start

import {
  createClientWithBuilder,
  RecognitionProvider,
  DeepgramModel,
  STAGES
} from '@volley/recognition-client-sdk';

// Create client with builder pattern (recommended)
const client = createClientWithBuilder(builder =>
  builder
    .stage(STAGES.STAGING)  // ✨ Simple environment selection using enum
    .provider(RecognitionProvider.DEEPGRAM)
    .model(DeepgramModel.NOVA_2)
    .onTranscript(result => {
      console.log('Final:', result.finalTranscript);
      console.log('Interim:', result.pendingTranscript);
    })
    .onError(error => console.error(error))
);

// Stream audio
await client.connect();
client.sendAudio(pcm16AudioChunk);  // Call repeatedly with audio chunks
await client.stopRecording();       // Wait for final transcript

// Check the actual URL being used
console.log('Connected to:', client.getUrl());

Alternative: Direct Client Creation

import {
  RealTimeTwoWayWebSocketRecognitionClient,
  RecognitionProvider,
  DeepgramModel,
  Language,
  STAGES
} from '@volley/recognition-client-sdk';

const client = new RealTimeTwoWayWebSocketRecognitionClient({
  stage: STAGES.STAGING,  // ✨ Recommended: Use STAGES enum for type safety
  asrRequestConfig: {
    provider: RecognitionProvider.DEEPGRAM,
    model: DeepgramModel.NOVA_2,
    language: Language.ENGLISH_US
  },
  onTranscript: (result) => console.log(result),
  onError: (error) => console.error(error)
});

// Check the actual URL being used
console.log('Connected to:', client.getUrl());

Configuration

Environment Selection

Recommended: Use stage parameter with STAGES enum for automatic environment configuration:

import {
  RecognitionProvider,
  DeepgramModel,
  Language,
  STAGES
} from '@volley/recognition-client-sdk';

builder
  .stage(STAGES.STAGING)  // STAGES.LOCAL | STAGES.DEV | STAGES.STAGING | STAGES.PRODUCTION
  .provider(RecognitionProvider.DEEPGRAM)  // DEEPGRAM, GOOGLE
  .model(DeepgramModel.NOVA_2)              // Provider-specific model enum
  .language(Language.ENGLISH_US)            // Language enum
  .interimResults(true)                     // Enable partial transcripts

Available Stages and URLs:

Stage Enum WebSocket URL
Local STAGES.LOCAL ws://localhost:3101/ws/v1/recognize
Development STAGES.DEV wss://recognition-service-dev.volley-services.net/ws/v1/recognize
Staging STAGES.STAGING wss://recognition-service-staging.volley-services.net/ws/v1/recognize
Production STAGES.PRODUCTION wss://recognition-service.volley-services.net/ws/v1/recognize

💡 Using the stage parameter automatically constructs the correct URL for each environment.

Automatic Connection Retry:

The SDK automatically retries failed connections with sensible defaults - no configuration needed!

Default behavior (works out of the box):

  • 4 connection attempts (try once, retry 3 times if failed)
  • 200ms delay between retries
  • Handles temporary service unavailability (503)
  • Fast failure (~600ms total on complete failure)
  • Timing: Attempt 1 → FAIL → wait 200ms → Attempt 2 → FAIL → wait 200ms → Attempt 3 → FAIL → wait 200ms → Attempt 4
import { STAGES } from '@volley/recognition-client-sdk';

// ✅ Automatic retry - no config needed!
const client = new RealTimeTwoWayWebSocketRecognitionClient({
  stage: STAGES.STAGING,
  // connectionRetry works automatically with defaults
});

Optional: Customize retry behavior (only if needed):

const client = new RealTimeTwoWayWebSocketRecognitionClient({
  stage: STAGES.STAGING,
  connectionRetry: {
    maxAttempts: 2,  // Fewer attempts (min: 1, max: 5)
    delayMs: 500     // Longer delay between attempts
  }
});

⚠️ Note: Retry only applies to initial connection establishment. If the connection drops during audio streaming, the SDK will not auto-retry (caller must handle this).

Advanced: Custom URL for non-standard endpoints:

builder
  .url('wss://custom-endpoint.example.com/ws/v1/recognize')  // Custom WebSocket URL
  .provider(RecognitionProvider.DEEPGRAM)
  // ... rest of config

💡 Note: If both stage and url are provided, url takes precedence.

Event Handlers

builder
  .onTranscript(result => {})     // Handle transcription results
  .onError(error => {})            // Handle errors
  .onConnected(() => {})           // Connection established
  .onDisconnected((code) => {})   // Connection closed
  .onMetadata(meta => {})          // Timing information

Optional Parameters

builder
  .gameContext({                   // Context for better recognition
    gameId: 'session-123',
    prompt: 'Expected responses: yes, no, maybe'
  })
  .userId('user-123')              // User identification
  .platform('web')                 // Platform identifier
  .logger((level, msg, data) => {})  // Custom logging

API Reference

Client Methods

await client.connect();           // Establish connection
client.sendAudio(chunk);          // Send PCM16 audio
await client.stopRecording();     // End and get final transcript
client.getAudioUtteranceId();     // Get session UUID
client.getUrl();                  // Get actual WebSocket URL being used
client.getState();                // Get current state
client.isConnected();             // Check connection status

TranscriptionResult

{
  type: 'Transcription';                   // Message type discriminator
  audioUtteranceId: string;                // Session UUID
  finalTranscript: string;                 // Confirmed text (won't change)
  finalTranscriptConfidence?: number;      // Confidence 0-1 for final transcript
  pendingTranscript?: string;              // In-progress text (may change)
  pendingTranscriptConfidence?: number;    // Confidence 0-1 for pending transcript
  is_finished: boolean;                    // Transcription complete (last message)
  voiceStart?: number;                     // Voice activity start time (ms from stream start)
  voiceDuration?: number;                  // Voice duration (ms)
  voiceEnd?: number;                       // Voice activity end time (ms from stream start)
  startTimestamp?: number;                 // Transcription start timestamp (ms)
  endTimestamp?: number;                   // Transcription end timestamp (ms)
  receivedAtMs?: number;                   // Server receive timestamp (ms since epoch)
  accumulatedAudioTimeMs?: number;         // Total audio duration sent (ms)
}

Providers

Deepgram

import { RecognitionProvider, DeepgramModel } from '@volley/recognition-client-sdk';

builder
  .provider(RecognitionProvider.DEEPGRAM)
  .model(DeepgramModel.NOVA_2);        // NOVA_2, NOVA_3, FLUX_GENERAL_EN

Google Cloud Speech-to-Text

import { RecognitionProvider, GoogleModel } from '@volley/recognition-client-sdk';

builder
  .provider(RecognitionProvider.GOOGLE)
  .model(GoogleModel.LATEST_SHORT);    // LATEST_SHORT, LATEST_LONG, TELEPHONY, etc.

Available Google models:

  • LATEST_SHORT - Optimized for short audio (< 1 minute)
  • LATEST_LONG - Optimized for long audio (> 1 minute)
  • TELEPHONY - Optimized for phone audio
  • TELEPHONY_SHORT - Short telephony audio
  • MEDICAL_DICTATION - Medical dictation (premium)
  • MEDICAL_CONVERSATION - Medical conversations (premium)

Audio Format

The SDK expects PCM16 audio:

  • Format: Linear PCM (16-bit signed integers)
  • Sample Rate: 16kHz recommended
  • Channels: Mono Please reach out to AI team if ther are essential reasons that we need other formats.

Error Handling

builder.onError(error => {
  console.error(`Error ${error.code}: ${error.message}`);
});

// Check disconnection type
import { isNormalDisconnection } from '@volley/recognition-client-sdk';

builder.onDisconnected((code, reason) => {
  if (!isNormalDisconnection(code)) {
    console.error('Unexpected disconnect:', code);
  }
});

Troubleshooting

Connection Issues

WebSocket fails to connect

  • Verify the recognition service is running
  • Check the WebSocket URL format: ws:// or wss://
  • Ensure network allows WebSocket connections

Authentication errors

  • Verify audioUtteranceId is provided
  • Check if service requires additional auth headers

Audio Issues

No transcription results

  • Confirm audio format is PCM16, 16kHz, mono
  • Check if audio chunks are being sent (use onAudioSent callback)
  • Verify audio data is not empty or corrupted

Poor transcription quality

  • Try different models (e.g., NOVA_2 vs NOVA_2_GENERAL)
  • Adjust language setting to match audio
  • Ensure audio sample rate matches configuration

Performance Issues

High latency

  • Use smaller audio chunks (e.g., 100ms instead of 500ms)
  • Choose a model optimized for real-time (e.g., Deepgram Nova 2)
  • Check network latency to service

Memory issues

  • Call disconnect() when done to clean up resources
  • Avoid keeping multiple client instances active

Publishing

This package uses automated publishing via semantic-release with npm Trusted Publishers (OIDC).

First-Time Setup (One-time)

After the first manual publish, configure npm Trusted Publishers:

  1. Go to https://www.npmjs.com/package/@volley/recognition-client-sdk/access
  2. Click "Add publisher" → Select "GitHub Actions"
  3. Configure:
    • Organization: Volley-Inc
    • Repository: recognition-service
    • Workflow: sdk-release.yml
    • Environment: Leave empty (not required)

How It Works

  • Automated releases: Push to dev branch triggers semantic-release
  • Version bumping: Based on conventional commits (feat/fix/BREAKING CHANGE)
  • No tokens needed: Uses OIDC authentication with npm
  • Provenance: Automatic supply chain attestation
  • Path filtering: Only releases when SDK or libs change

If needed for testing:

cd packages/client-sdk-ts
npm login --scope=@volley
pnpm build
npm publish --provenance --access public

Contributing

This SDK is part of the Recognition Service monorepo. To contribute:

  1. Make changes to SDK or libs
  2. Test locally with pnpm test
  3. Create PR to dev branch with conventional commit messages (feat:, fix:, etc.)
  4. After merge, automated workflow will publish new version to npm

License

Proprietary