Package Exports
- @volley/recognition-client-sdk
- @volley/recognition-client-sdk/browser
Readme
@volley/recognition-client-sdk
TypeScript SDK for real-time speech recognition via WebSocket.
Installation
npm install @volley/recognition-client-sdkQuick Start
import {
createClientWithBuilder,
RecognitionProvider,
DeepgramModel,
STAGES
} from '@volley/recognition-client-sdk';
// Create client with builder pattern (recommended)
const client = createClientWithBuilder(builder =>
builder
.stage(STAGES.STAGING) // ✨ Simple environment selection using enum
.provider(RecognitionProvider.DEEPGRAM)
.model(DeepgramModel.NOVA_2)
.onTranscript(result => {
console.log('Final:', result.finalTranscript);
console.log('Interim:', result.pendingTranscript);
})
.onError(error => console.error(error))
);
// Stream audio
await client.connect();
client.sendAudio(pcm16AudioChunk); // Call repeatedly with audio chunks
await client.stopRecording(); // Wait for final transcript
// Check the actual URL being used
console.log('Connected to:', client.getUrl());Alternative: Direct Client Creation
import {
RealTimeTwoWayWebSocketRecognitionClient,
RecognitionProvider,
DeepgramModel,
Language,
STAGES
} from '@volley/recognition-client-sdk';
const client = new RealTimeTwoWayWebSocketRecognitionClient({
stage: STAGES.STAGING, // ✨ Recommended: Use STAGES enum for type safety
asrRequestConfig: {
provider: RecognitionProvider.DEEPGRAM,
model: DeepgramModel.NOVA_2,
language: Language.ENGLISH_US
},
onTranscript: (result) => console.log(result),
onError: (error) => console.error(error)
});
// Check the actual URL being used
console.log('Connected to:', client.getUrl());Configuration
Environment Selection
Recommended: Use stage parameter with STAGES enum for automatic environment configuration:
import {
RecognitionProvider,
DeepgramModel,
Language,
STAGES
} from '@volley/recognition-client-sdk';
builder
.stage(STAGES.STAGING) // STAGES.LOCAL | STAGES.DEV | STAGES.STAGING | STAGES.PRODUCTION
.provider(RecognitionProvider.DEEPGRAM) // DEEPGRAM, GOOGLE
.model(DeepgramModel.NOVA_2) // Provider-specific model enum
.language(Language.ENGLISH_US) // Language enum
.interimResults(true) // Enable partial transcriptsAvailable Stages and URLs:
| Stage | Enum | WebSocket URL |
|---|---|---|
| Local | STAGES.LOCAL |
ws://localhost:3101/ws/v1/recognize |
| Development | STAGES.DEV |
wss://recognition-service-dev.volley-services.net/ws/v1/recognize |
| Staging | STAGES.STAGING |
wss://recognition-service-staging.volley-services.net/ws/v1/recognize |
| Production | STAGES.PRODUCTION |
wss://recognition-service.volley-services.net/ws/v1/recognize |
💡 Using the
stageparameter automatically constructs the correct URL for each environment.
Automatic Connection Retry:
The SDK automatically retries failed connections with sensible defaults - no configuration needed!
Default behavior (works out of the box):
- 4 connection attempts (try once, retry 3 times if failed)
- 200ms delay between retries
- Handles temporary service unavailability (503)
- Fast failure (~600ms total on complete failure)
- Timing:
Attempt 1 → FAIL → wait 200ms → Attempt 2 → FAIL → wait 200ms → Attempt 3 → FAIL → wait 200ms → Attempt 4
import { STAGES } from '@volley/recognition-client-sdk';
// ✅ Automatic retry - no config needed!
const client = new RealTimeTwoWayWebSocketRecognitionClient({
stage: STAGES.STAGING,
// connectionRetry works automatically with defaults
});Optional: Customize retry behavior (only if needed):
const client = new RealTimeTwoWayWebSocketRecognitionClient({
stage: STAGES.STAGING,
connectionRetry: {
maxAttempts: 2, // Fewer attempts (min: 1, max: 5)
delayMs: 500 // Longer delay between attempts
}
});⚠️ Note: Retry only applies to initial connection establishment. If the connection drops during audio streaming, the SDK will not auto-retry (caller must handle this).
Advanced: Custom URL for non-standard endpoints:
builder
.url('wss://custom-endpoint.example.com/ws/v1/recognize') // Custom WebSocket URL
.provider(RecognitionProvider.DEEPGRAM)
// ... rest of config💡 Note: If both
stageandurlare provided,urltakes precedence.
Event Handlers
builder
.onTranscript(result => {}) // Handle transcription results
.onError(error => {}) // Handle errors
.onConnected(() => {}) // Connection established
.onDisconnected((code) => {}) // Connection closed
.onMetadata(meta => {}) // Timing informationOptional Parameters
builder
.gameContext({ // Context for better recognition
gameId: 'session-123',
prompt: 'Expected responses: yes, no, maybe'
})
.userId('user-123') // User identification
.platform('web') // Platform identifier
.logger((level, msg, data) => {}) // Custom loggingAPI Reference
Client Methods
await client.connect(); // Establish connection
client.sendAudio(chunk); // Send PCM16 audio
await client.stopRecording(); // End and get final transcript
client.getAudioUtteranceId(); // Get session UUID
client.getUrl(); // Get actual WebSocket URL being used
client.getState(); // Get current state
client.isConnected(); // Check connection statusTranscriptionResult
{
type: 'Transcription'; // Message type discriminator
audioUtteranceId: string; // Session UUID
finalTranscript: string; // Confirmed text (won't change)
finalTranscriptConfidence?: number; // Confidence 0-1 for final transcript
pendingTranscript?: string; // In-progress text (may change)
pendingTranscriptConfidence?: number; // Confidence 0-1 for pending transcript
is_finished: boolean; // Transcription complete (last message)
voiceStart?: number; // Voice activity start time (ms from stream start)
voiceDuration?: number; // Voice duration (ms)
voiceEnd?: number; // Voice activity end time (ms from stream start)
startTimestamp?: number; // Transcription start timestamp (ms)
endTimestamp?: number; // Transcription end timestamp (ms)
receivedAtMs?: number; // Server receive timestamp (ms since epoch)
accumulatedAudioTimeMs?: number; // Total audio duration sent (ms)
}Providers
Deepgram
import { RecognitionProvider, DeepgramModel } from '@volley/recognition-client-sdk';
builder
.provider(RecognitionProvider.DEEPGRAM)
.model(DeepgramModel.NOVA_2); // NOVA_2, NOVA_3, FLUX_GENERAL_ENGoogle Cloud Speech-to-Text
import { RecognitionProvider, GoogleModel } from '@volley/recognition-client-sdk';
builder
.provider(RecognitionProvider.GOOGLE)
.model(GoogleModel.LATEST_SHORT); // LATEST_SHORT, LATEST_LONG, TELEPHONY, etc.Available Google models:
LATEST_SHORT- Optimized for short audio (< 1 minute)LATEST_LONG- Optimized for long audio (> 1 minute)TELEPHONY- Optimized for phone audioTELEPHONY_SHORT- Short telephony audioMEDICAL_DICTATION- Medical dictation (premium)MEDICAL_CONVERSATION- Medical conversations (premium)
Audio Format
The SDK expects PCM16 audio:
- Format: Linear PCM (16-bit signed integers)
- Sample Rate: 16kHz recommended
- Channels: Mono Please reach out to AI team if ther are essential reasons that we need other formats.
Error Handling
builder.onError(error => {
console.error(`Error ${error.code}: ${error.message}`);
});
// Check disconnection type
import { isNormalDisconnection } from '@volley/recognition-client-sdk';
builder.onDisconnected((code, reason) => {
if (!isNormalDisconnection(code)) {
console.error('Unexpected disconnect:', code);
}
});Troubleshooting
Connection Issues
WebSocket fails to connect
- Verify the recognition service is running
- Check the WebSocket URL format:
ws://orwss:// - Ensure network allows WebSocket connections
Authentication errors
- Verify
audioUtteranceIdis provided - Check if service requires additional auth headers
Audio Issues
No transcription results
- Confirm audio format is PCM16, 16kHz, mono
- Check if audio chunks are being sent (use
onAudioSentcallback) - Verify audio data is not empty or corrupted
Poor transcription quality
- Try different models (e.g.,
NOVA_2vsNOVA_2_GENERAL) - Adjust language setting to match audio
- Ensure audio sample rate matches configuration
Performance Issues
High latency
- Use smaller audio chunks (e.g., 100ms instead of 500ms)
- Choose a model optimized for real-time (e.g., Deepgram Nova 2)
- Check network latency to service
Memory issues
- Call
disconnect()when done to clean up resources - Avoid keeping multiple client instances active
Publishing
This package uses automated publishing via semantic-release with npm Trusted Publishers (OIDC).
First-Time Setup (One-time)
After the first manual publish, configure npm Trusted Publishers:
- Go to https://www.npmjs.com/package/@volley/recognition-client-sdk/access
- Click "Add publisher" → Select "GitHub Actions"
- Configure:
- Organization:
Volley-Inc - Repository:
recognition-service - Workflow:
sdk-release.yml - Environment: Leave empty (not required)
- Organization:
How It Works
- Automated releases: Push to
devbranch triggers semantic-release - Version bumping: Based on conventional commits (feat/fix/BREAKING CHANGE)
- No tokens needed: Uses OIDC authentication with npm
- Provenance: Automatic supply chain attestation
- Path filtering: Only releases when SDK or libs change
Manual Publishing (Not Recommended)
If needed for testing:
cd packages/client-sdk-ts
npm login --scope=@volley
pnpm build
npm publish --provenance --access publicContributing
This SDK is part of the Recognition Service monorepo. To contribute:
- Make changes to SDK or libs
- Test locally with
pnpm test - Create PR to
devbranch with conventional commit messages (feat:,fix:, etc.) - After merge, automated workflow will publish new version to npm
License
Proprietary