JSPM

@ariaflowagents/realtime-audio

1.0.0
    • ESM via JSPM
    • ES Module Entrypoint
    • Export Map
    • Keywords
    • License
    • Repository URL
    • TypeScript Types
    • README
    • Created
    • Published
    • Downloads 9
    • Score
      100M100P100Q53923F

    Realtime audio pipeline for AriaFlow — multi-provider speech-to-speech and orchestration.

    Package Exports

    • @ariaflowagents/realtime-audio

    Readme

    @ariaflowagents/realtime-audio

    Realtime audio pipeline for AriaFlow — the multi-provider foundation for speech-to-speech voice agents and their orchestration. Ships provider clients for Google Gemini Live and OpenAI Realtime today, a provider-agnostic RealtimeAudioClient interface other providers plug into, and a VoiceEngine / CallWorker pair that bridges any audio transport (WebSocket, LiveKit, etc.) to the chosen provider while handling tools, session state, and event logging via AriaFlow's Foundation primitives. (Renamed from @ariaflowagents/gemini-native-audio at v0.10.0; the historical "Gemini Live native audio" docs below reflect the original Gemini-specific slice and remain accurate for that provider.)

    What This Does

    Unlike traditional voice pipelines (STT → LLM → TTS), Gemini Live accepts raw audio input and produces raw audio output in a single model call. This package wraps that capability for AriaFlow agents:

    • VoiceEngine — Call acceptor. Accepts incoming audio connections and creates per-call workers.
    • CallWorker — Per-call lifecycle manager. Bridges your audio transport (WebSocket, LiveKit, etc.) to a Gemini Live session. Handles tool calls, session state, and event logging using AriaFlow's Foundation primitives.
    • GeminiLiveSession — Thin wrapper around @google/genai ai.live.connect(). Manages the WebSocket connection to Gemini, audio encoding (base64 PCM ↔ Uint8Array), tool dispatch, and session resumption.
    • toolSetToGeminiDeclarations — Converts AriaFlow/AI SDK tool definitions (Zod schemas) to Gemini's FunctionDeclaration format.

    Architecture

    ┌─────────────┐     ┌─────────────┐     ┌────────────────────┐
    │   Client     │────>│ CallWorker  │────>│ GeminiLiveSession  │
    │  (WebSocket) │     │             │     │                    │
    │              │<────│  audio +    │<────│  Gemini Live API   │
    │  audio in/out│     │  tool calls │     │  (native audio)    │
    └─────────────┘     └─────────────┘     └────────────────────┘
                              │
                              ├── ToolExecutor (runs AriaFlow tools)
                              ├── ConversationState (persists transcripts)
                              └── ConversationEventLog (records events)

    Usage

    import { VoiceEngine } from '@ariaflowagents/realtime-audio';
    import { createFoundation } from '@ariaflowagents/core/foundation';
    
    const foundation = createFoundation({ /* ... */ });
    
    const engine = new VoiceEngine({
      foundation,
      agents: [
        {
          id: 'receptionist',
          name: 'Hospital Receptionist',
          prompt: 'You are a hospital receptionist. Help patients schedule appointments.',
          voice: 'Charon', // Gemini voice preset
          tools: { /* AriaFlow tools */ },
        },
      ],
      defaultAgentId: 'receptionist',
      gemini: {
        apiKey: process.env.GOOGLE_API_KEY!,
        model: 'gemini-2.5-flash-native-audio-preview', // default
      },
    });
    
    // Accept a call from any audio transport
    const worker = await engine.acceptCall({
      callId: crypto.randomUUID(),
      transport: myWebSocketTransport, // implements TransportSession
    });
    
    await worker.start();

    TransportSession Interface

    Implement this to connect any audio source/sink:

    interface TransportSession {
      sendAudio(data: Uint8Array): void;       // Send audio to client
      onAudio(handler: (data: Uint8Array) => void): void;  // Receive audio from client
      onClose(handler: () => void): void;      // Handle disconnect
      close(): void;                           // Close the transport
    }

    Events

    GeminiLiveSession emits RealtimeEvents:

    Event Description
    audio Raw PCM audio from Gemini (send to client)
    transcript Text transcript (user or assistant)
    tool-call Gemini wants to call a tool
    tool-result Tool execution result
    turn-complete Model finished speaking
    interrupted User interrupted the model
    session-resumed Session resumption handle updated
    error Error from Gemini

    Key Details

    • Audio format: 16-bit PCM at 24kHz
    • Default model: gemini-2.5-flash-native-audio-preview
    • Session resumption: Automatic — GeminiLiveSession tracks resumption handles
    • Tool execution: Uses AriaFlow's ToolExecutor with timeout support
    • State persistence: Transcripts are saved to session via ConversationState

    Peer Dependencies

    • @ariaflowagents/core — Foundation primitives (ToolExecutor, ConversationState, etc.)
    • ai (v6+) — Vercel AI SDK
    • zod — Schema definitions for tools