Package Exports

@volley/recognition-client-sdk
@volley/recognition-client-sdk/browser

Readme

@volley/recognition-client-sdk

TypeScript SDK for real-time speech recognition via WebSocket.

Installation

npm install @volley/recognition-client-sdk

Quick Start

import {
  createClientWithBuilder,
  RecognitionProvider,
  DeepgramModel,
  STAGES
} from '@volley/recognition-client-sdk';

// Create client with builder pattern (recommended)
const client = createClientWithBuilder(builder =>
  builder
    .stage(STAGES.STAGING)  // ✨ Simple environment selection using enum
    .provider(RecognitionProvider.DEEPGRAM)
    .model(DeepgramModel.NOVA_2)
    .onTranscript(result => {
      console.log('Final:', result.finalTranscript);
      console.log('Interim:', result.pendingTranscript);
    })
    .onError(error => console.error(error))
);

// Stream audio
await client.connect();
client.sendAudio(pcm16AudioChunk);  // Call repeatedly with audio chunks
await client.stopRecording();       // Wait for final transcript

// Check the actual URL being used
console.log('Connected to:', client.getUrl());

Alternative: Direct Client Creation

import {
  RealTimeTwoWayWebSocketRecognitionClient,
  RecognitionProvider,
  DeepgramModel,
  Language,
  STAGES
} from '@volley/recognition-client-sdk';

const client = new RealTimeTwoWayWebSocketRecognitionClient({
  stage: STAGES.STAGING,  // ✨ Recommended: Use STAGES enum for type safety
  asrRequestConfig: {
    provider: RecognitionProvider.DEEPGRAM,
    model: DeepgramModel.NOVA_2,
    language: Language.ENGLISH_US
  },
  onTranscript: (result) => console.log(result),
  onError: (error) => console.error(error)
});

// Check the actual URL being used
console.log('Connected to:', client.getUrl());

Configuration

Environment Selection

Recommended: Use stage parameter with STAGES enum for automatic environment configuration:

import {
  RecognitionProvider,
  DeepgramModel,
  Language,
  STAGES
} from '@volley/recognition-client-sdk';

builder
  .stage(STAGES.STAGING)  // STAGES.LOCAL | STAGES.DEV | STAGES.STAGING | STAGES.PRODUCTION
  .provider(RecognitionProvider.DEEPGRAM)  // DEEPGRAM, GOOGLE
  .model(DeepgramModel.NOVA_2)              // Provider-specific model enum
  .language(Language.ENGLISH_US)            // Language enum
  .interimResults(true)                     // Enable partial transcripts

Available Stages and URLs:

Stage	Enum	WebSocket URL
Local	`STAGES.LOCAL`	`ws://localhost:3101/ws/v1/recognize`
Development	`STAGES.DEV`	`wss://recognition-service-dev.volley-services.net/ws/v1/recognize`
Staging	`STAGES.STAGING`	`wss://recognition-service-staging.volley-services.net/ws/v1/recognize`
Production	`STAGES.PRODUCTION`	`wss://recognition-service.volley-services.net/ws/v1/recognize`

💡 Using the stage parameter automatically constructs the correct URL for each environment.

Automatic Connection Retry:

The SDK automatically retries failed connections with sensible defaults - no configuration needed!

Default behavior (works out of the box):

4 connection attempts (try once, retry 3 times if failed)
200ms delay between retries
Handles temporary service unavailability (503)
Fast failure (~600ms total on complete failure)
Timing: Attempt 1 → FAIL → wait 200ms → Attempt 2 → FAIL → wait 200ms → Attempt 3 → FAIL → wait 200ms → Attempt 4

import { STAGES } from '@volley/recognition-client-sdk';

// ✅ Automatic retry - no config needed!
const client = new RealTimeTwoWayWebSocketRecognitionClient({
  stage: STAGES.STAGING,
  // connectionRetry works automatically with defaults
});

Optional: Customize retry behavior (only if needed):

const client = new RealTimeTwoWayWebSocketRecognitionClient({
  stage: STAGES.STAGING,
  connectionRetry: {
    maxAttempts: 2,  // Fewer attempts (min: 1, max: 5)
    delayMs: 500     // Longer delay between attempts
  }
});

⚠️ Note: Retry only applies to initial connection establishment. If the connection drops during audio streaming, the SDK will not auto-retry (caller must handle this).

Advanced: Custom URL for non-standard endpoints:

builder
  .url('wss://custom-endpoint.example.com/ws/v1/recognize')  // Custom WebSocket URL
  .provider(RecognitionProvider.DEEPGRAM)
  // ... rest of config

💡 Note: If both stage and url are provided, url takes precedence.

Event Handlers

builder
  .onTranscript(result => {})     // Handle transcription results
  .onError(error => {})            // Handle errors
  .onConnected(() => {})           // Connection established
  .onDisconnected((code) => {})   // Connection closed
  .onMetadata(meta => {})          // Timing information

Optional Parameters

builder
  .gameContext({                   // Context for better recognition
    gameId: 'session-123',
    prompt: 'Expected responses: yes, no, maybe'
  })
  .userId('user-123')              // User identification
  .platform('web')                 // Platform identifier
  .logger((level, msg, data) => {})  // Custom logging

API Reference

Client Methods

await client.connect();           // Establish connection
client.sendAudio(chunk);          // Send PCM16 audio
await client.stopRecording();     // End and get final transcript
client.getAudioUtteranceId();     // Get session UUID
client.getUrl();                  // Get actual WebSocket URL being used
client.getState();                // Get current state
client.isConnected();             // Check connection status

TranscriptionResult

{
  type: 'Transcription';                   // Message type discriminator
  audioUtteranceId: string;                // Session UUID
  finalTranscript: string;                 // Confirmed text (won't change)
  finalTranscriptConfidence?: number;      // Confidence 0-1 for final transcript
  pendingTranscript?: string;              // In-progress text (may change)
  pendingTranscriptConfidence?: number;    // Confidence 0-1 for pending transcript
  is_finished: boolean;                    // Transcription complete (last message)
  voiceStart?: number;                     // Voice activity start time (ms from stream start)
  voiceDuration?: number;                  // Voice duration (ms)
  voiceEnd?: number;                       // Voice activity end time (ms from stream start)
  startTimestamp?: number;                 // Transcription start timestamp (ms)
  endTimestamp?: number;                   // Transcription end timestamp (ms)
  receivedAtMs?: number;                   // Server receive timestamp (ms since epoch)
  accumulatedAudioTimeMs?: number;         // Total audio duration sent (ms)
}

Providers

Deepgram

import { RecognitionProvider, DeepgramModel } from '@volley/recognition-client-sdk';

builder
  .provider(RecognitionProvider.DEEPGRAM)
  .model(DeepgramModel.NOVA_2);        // NOVA_2, NOVA_3, FLUX_GENERAL_EN

Google Cloud Speech-to-Text

import { RecognitionProvider, GoogleModel } from '@volley/recognition-client-sdk';

builder
  .provider(RecognitionProvider.GOOGLE)
  .model(GoogleModel.LATEST_SHORT);    // LATEST_SHORT, LATEST_LONG, TELEPHONY, etc.

Available Google models:

LATEST_SHORT - Optimized for short audio (< 1 minute)
LATEST_LONG - Optimized for long audio (> 1 minute)
TELEPHONY - Optimized for phone audio
TELEPHONY_SHORT - Short telephony audio
MEDICAL_DICTATION - Medical dictation (premium)
MEDICAL_CONVERSATION - Medical conversations (premium)

Audio Format

The SDK expects PCM16 audio:

Format: Linear PCM (16-bit signed integers)
Sample Rate: 16kHz recommended
Channels: Mono Please reach out to AI team if ther are essential reasons that we need other formats.

Error Handling

builder.onError(error => {
  console.error(`Error ${error.code}: ${error.message}`);
});

// Check disconnection type
import { isNormalDisconnection } from '@volley/recognition-client-sdk';

builder.onDisconnected((code, reason) => {
  if (!isNormalDisconnection(code)) {
    console.error('Unexpected disconnect:', code);
  }
});

Troubleshooting

Connection Issues

WebSocket fails to connect

Verify the recognition service is running
Check the WebSocket URL format: ws:// or wss://
Ensure network allows WebSocket connections

Authentication errors

Verify audioUtteranceId is provided
Check if service requires additional auth headers

Audio Issues

No transcription results

Confirm audio format is PCM16, 16kHz, mono
Check if audio chunks are being sent (use onAudioSent callback)
Verify audio data is not empty or corrupted

Poor transcription quality

Try different models (e.g., NOVA_2 vs NOVA_2_GENERAL)
Adjust language setting to match audio
Ensure audio sample rate matches configuration

Performance Issues

High latency

Use smaller audio chunks (e.g., 100ms instead of 500ms)
Choose a model optimized for real-time (e.g., Deepgram Nova 2)
Check network latency to service

Memory issues

Call disconnect() when done to clean up resources
Avoid keeping multiple client instances active

Publishing

This package uses automated publishing via semantic-release with npm Trusted Publishers (OIDC).

First-Time Setup (One-time)

After the first manual publish, configure npm Trusted Publishers:

Go to https://www.npmjs.com/package/@volley/recognition-client-sdk/access
Click "Add publisher" → Select "GitHub Actions"
Configure:
- Organization: Volley-Inc
- Repository: recognition-service
- Workflow: sdk-release.yml
- Environment: Leave empty (not required)

How It Works

Automated releases: Push to dev branch triggers semantic-release
Version bumping: Based on conventional commits (feat/fix/BREAKING CHANGE)
No tokens needed: Uses OIDC authentication with npm
Provenance: Automatic supply chain attestation
Path filtering: Only releases when SDK or libs change

Manual Publishing (Not Recommended)

If needed for testing:

cd packages/client-sdk-ts
npm login --scope=@volley
pnpm build
npm publish --provenance --access public

Contributing

This SDK is part of the Recognition Service monorepo. To contribute:

Make changes to SDK or libs
Test locally with pnpm test
Create PR to dev branch with conventional commit messages (feat:, fix:, etc.)
After merge, automated workflow will publish new version to npm

License

Proprietary