Package Exports

@voctiv/agent-sdk

Readme

@voctiv/agent-sdk

TypeScript SDK for scripts executed by the ScriptEngine scripting runtime.

The package exports the defineScript() identity helper and the public ScriptEngine runtime types for voice channels, SIP calls, ASR, TTS, VAD, Smart Turn, LLM, dialog context, logging, and Voctiv legacy platform compatibility APIs.

The SDK itself does not open SIP calls, run ASR/TTS, or talk to platform services; it describes the objects injected into your script by ScriptEngine.

Installation

npm install @voctiv/agent-sdk rxjs

rxjs is a peer dependency because the runtime API exposes observables for ASR, SIP, channel events, queues, and LLM streams.

Basic Script

Scripts export a function created with defineScript(). The runtime loads the module and calls it with { channel, logger, context, platform }.

import { defineScript } from '@voctiv/agent-sdk';
import { filter, map, merge } from 'rxjs';

const TTS_QUEUE = 1;

export default defineScript(async ({ channel, logger, context }) => {
  channel.sip.answer(); // No-op on WS/headless channels.
  channel.sendMessage({ event: 'status', payload: 'ready' });

  const asr = await channel.createAsr({
    language: context.language || 'ru-RU',
    vad: { preSpeechFrames: 20, postSpeechFrames: 4 },
    smartTurn: { enabled: true, triggerFrames: 3, confirmMs: 50 },
  });

  // Barge-in: user speech stops only the agent TTS queue.
  merge(asr.speechStart$, asr.partial$.pipe(filter((p) => !!p.text?.trim()))).subscribe(() => {
    channel.audio.stop(TTS_QUEUE);
  });

  await channel.audio.say('Hi! Say anything and I will answer briefly.', {
    queue: TTS_QUEUE,
    alias: 'greeting',
  });

  asr.result$.pipe(filter((text) => !!text.trim())).subscribe((userText) => {
    logger.log('User said', { userText });

    const reply$ = channel.llm
      .stream(`Reply briefly to the user: "${userText}"`, {
        agentUuid: context.agentUuid,
        dialogUuid: context.dialogUuid,
      })
      .pipe(map((chunk) => chunk.content));

    void channel.audio.say(reply$, {
      queue: TTS_QUEUE,
      alias: 'reply',
      ttsStrategy: 'sentence',
    });
  });

  return new Promise((resolve) => {
    channel.events.terminated$.subscribe(() => {
      asr.destroy();
      resolve({ output: { dialogUuid: context.dialogUuid } });
    });
  });
});

What The SDK Contains

defineScript(fn) marks the default export as the script entry point. It returns the same function and exists to give TypeScript the correct ScriptContext shape.

ScriptContext is the top-level object passed to a script:

channel is the media channel for SIP, WS, ASR, TTS, audio playback, LLM, and structured data messages.
logger writes structured script logs and can stream logs to a debug endpoint.
context contains dialog identity, caller/called numbers, language, flags, params, entry point, persisted env, and runtime budget.
platform exposes platform operations: NLU, dialog state, outbound call scheduling, messaging, and phrase records.

MediaChannel is the main real-time API:

channel.type is "sip" for telephony and "ws" for WebSocket/script-manager sessions. Headless sessions currently expose a synthetic "ws" channel; check context.headless to detect them.
channel.params is the merged runtime parameter map. Treat unknown keys as host-specific.
channel.createAsr() creates an ASR handle.
channel.audio controls TTS, raw playback, pre-synthesis, and mixer queues.
channel.sip controls SIP state, pre-answer media, DTMF, hold/mute/hangup, outbound calls, and bridging.
channel.llm talks to the Omni LLM backend.
channel.events exposes speech, interrupt, termination, WS data message, and media error observables.
channel.textInput injects synthetic ASR results for tests and debug clients.

SIP And Pre-Answer Media

SIP sessions expose call state through channel.sip.state, state$, progress$, early$, and answered$.

The important states are:

ringing: INVITE is in progress, but no media is available yet.
early: RTP is ready before the final 200 OK answer. ASR, TTS, playback, and DTMF work in this state.
active: final 200 OK has been received or sent.
terminated: the call ended and no more audio is possible.

How Pre-Answer Works

Pre-answer means the SIP media path is open before the call is finally answered with 200 OK. In this state the caller can already hear TTS, the script can already receive audio for ASR, and DTMF can be exchanged.

Use pre-answer when you need to do something before committing the call to the final answer state:

play an informational greeting or disclaimer;
collect a short value with ASR, such as account number or menu choice;
detect and navigate an IVR that speaks before answering;
delay answer() until the script is ready to transfer, bridge, or continue.

For inbound calls, the script controls this explicitly:

Call channel.sip.sendProgress() to send 183 Session Progress with SDP.
Wait for channel.sip.waitForEarly() if your next logic step needs media to be ready.
Use channel.audio.say(), channel.audio.play(), channel.createAsr(), or channel.sip.sendDtmf() normally.
Call channel.sip.answer() when you want to send the final 200 OK.

For outbound calls, pre-answer is controlled by the remote side. If the remote endpoint sends 183 Session Progress with SDP, ScriptEngine moves the call to early. If it answers directly, waitForEarly() resolves when the call becomes active.

early is a media-ready state, not a final answer state. answer() is still the explicit transition that sends final 200 OK for inbound calls. External billing behavior depends on the carrier.

Outbound Pre-Answer

For outbound calls, early media starts when the remote side sends a provisional response with SDP, usually 183 Session Progress. This is useful for IVRs that speak before answering.

const bLeg = await channel.sip.makeCall({
  sipUri: 'sip:+12025551234@trunk.example.com',
});

await bLeg.sip.waitForEarly();

const asr = await bLeg.createAsr({ language: 'en-US' });
asr.result$.subscribe((text) => {
  if (/press one/i.test(text)) {
    bLeg.sip.sendDtmf('1');
  }
});

waitForEarly() resolves when the call reaches either early or active. If a carrier skips early media and answers directly, it resolves on the final answer.

Inbound Pre-Answer

For inbound calls, call channel.sip.sendProgress() to send 183 Session Progress with SDP. This enters early state and enables full-duplex audio before the final answer.

import { firstValueFrom } from 'rxjs';

channel.sip.sendProgress();
await channel.sip.waitForEarly();

const asr = await channel.createAsr({ language: 'en-US' });
await channel.audio.say('Please say your account number.');

const account = await firstValueFrom(asr.result$);

channel.sip.answer();
await channel.audio.say(`Thank you. Looking up account ${account}.`);

The API does not mark the call as answered until answer() sends final 200 OK. External billing still depends on carrier policy.

Audio Auto-Wait

On SIP channels, channel.audio.say() and channel.audio.play() automatically wait until RTP is ready (early or active). You only need explicit waitForEarly() / waitForAnswer() when your script logic depends on the state transition.

If the call terminates before media becomes available, deferred audio resolves as a no-op.

SIP Controls

channel.sip also supports:

answer() for inbound final answer.
sendDtmf(digit, duration?) for IVR navigation.
sendInfo(contentType, body) for SIP INFO messages.
hold() / unhold() for SIP hold.
mute() / unmute() for local outgoing audio suppression.
hangup() to terminate the call.
makeCall() to create an outbound SIP B-leg from the main SIP channel.
bridge(other) to cross-connect two SIP channels.

makeCall() and bridge() are only supported by the main SIP channel. Worker-isolated, WS, and headless channels do not create nested SIP legs.

ASR, VAD, And Smart Turn

Create ASR with channel.createAsr(config?).

const asr = await channel.createAsr({
  vendor: 'yandex',
  name: 'main-yandex-key',
  language: 'ru-RU',
  vad: {
    positiveThreshold: 0.55,
    negativeThreshold: 0.35,
    preSpeechFrames: 12,
    postSpeechFrames: 12,
  },
  smartTurn: {
    enabled: true,
    silenceTimeoutMs: 1200,
  },
});

AsrHandle exposes:

result$: finalized utterances.
partial$: streaming partial hypotheses.
speechStart$ / speechEnd$: VAD speech boundaries.
interrupt$: barge-in / interrupt events where the host supports them.
vadProbability$: normalized VAD probability when available.
error$: runtime errors from the ASR provider (see Error Handling).
pause() / resume() to stop or resume forwarding new audio frames.
finalize() to force the current utterance to flush.
destroy() to close connector streams and subscriptions.

SIP sessions use the call-level telephony VAD when it is available. WS sessions create one VAD/SmartTurn instance for the socket session on the first createAsr() call. Headless sessions return an inert ASR handle with empty observables.

If ASR connector creation fails, SIP/WS return a degraded handle. VAD observables still mirror the channel where possible, but no real STT results are emitted. The creation failure is reported on channel.events.error$.

ASR Credentials And Vendors

AsrConfig.vendor is an engine hint, for example "yandex", "deepgram", "azure", "elevenlabs", or "neuro_v3", resolved by the host vendor alias mapping.

Direct ASR Vendor Parameters

Pass vendor-native credentials and settings directly through AsrConfig.data. These values are forwarded to the connector as-is and override any defaults or platform-resolved credentials.

const asr = await channel.createAsr({
  vendor: 'azure',
  language: 'ru-RU',
  data: {
    subscription_key: 'your-azure-key',
    region: 'swedencentral',
  },
});

Each vendor connector accepts its native parameter names:

Vendor	Accepted `data` keys
Azure	`subscription_key` or `api_key`, `region`
Yandex	`api_key` or `token`, `folder_id`
ElevenLabs	`api_key` (or `xi_api_key`), `model`
Deepgram	`api_key`
Google	`email`, `private_key`, `project_id`
Whisper	`url`, `rate`, `toFloat`

All vendors also accept the env-style names (AZURE_SPEECH_KEY, ELEVENLABS_API_KEY, etc.) for backwards compatibility, but vendor-native names are checked first and are preferred.

Voctiv Platform ASR Key Selection

In Voctiv legacy compatibility mode, ASR credentials can also be selected by logic-executor key_storage.name:

const asr = await channel.createAsr({
  name: 'main-asr-key',
  language: 'ru-RU',
});

The runtime looks in channel.params.authentication_data.legacyAsrKeysByName[name] for the current dialog agent and company. If name is omitted, channel.params.defaultAsrName may be used.

When both name (platform key) and explicit data are provided, data values win — they are applied last and override anything resolved from the platform.

TTS, Playback, And Mixer Queues

channel.audio.say(textOrObservable, options?) synthesizes text and plays it through the mixer.

await channel.audio.say('Please wait while I check that.', {
  queue: 0,
  alias: 'main-response',
  ttsVendor: 'elevenlabs',
  ttsStrategy: 'sentence',
  ttsConfig: {
    api_key: 'sk_your-key',
    voice_id: 'bBLRWT6MSWBFAm76ZWXY',
    model_id: 'eleven_turbo_v2_5',
    base_url: 'https://api.eu.residency.elevenlabs.io',
    output_format: 'pcm_16000',
  },
});

Use full vendor names for ttsVendor. Dedicated TTS vendors include "elevenlabs", "google", and "voctiv". The default TTS path can also accept compatible aliases such as "azure" or "neuro_v3", depending on how ScriptEngine is configured.

Vendor-native parameter names (api_key, voice_id, model_id, base_url) are passed directly to the connector and override any platform defaults. See TTS Credentials And Vendor Parameters for the full list of accepted keys per vendor.

channel.audio.play(source, options?) plays raw audio from a URL/path or a LegacyPhraseRecord.

await channel.audio.play('/opt/prompts/welcome.wav', {
  queue: 1,
  alias: 'welcome-earcon',
});

channel.audio.presay(text, options?) pre-synthesizes TTS into the host TTS cache. If the cache is not available, the runtime logs a warning and resolves without throwing.

channel.audio.preload(source) decodes a raw audio source through the audio player. It does not synthesize TTS and does not populate the TTS cache used by presay().

TTS Strategies

ttsStrategy controls how text is chunked:

sentence: split on sentence boundaries and synthesize each sentence. This is the default.
streaming: send chunks incrementally for streaming-capable vendors.
full: accumulate the whole input and synthesize it as one segment after the input completes.

When using an Observable<string> input, WS clients also receive text progress events for streamed chunks.

Mixer Queues

The mixer has queues 0 through 4. Use separate queues for main speech, earcons, hold music, or background audio.

const music = channel.audio.queue(2);
music.volume = 0.25;

await channel.audio.play('/opt/audio/hold.wav', {
  queue: 2,
  alias: 'hold-music',
  loop: true,
});

channel.audio.stop(2);

PlayOptions.volume changes the whole queue volume, not just one item. stop(queue) clears a queue and aborts in-flight sentence TTS for that queue. stopAll() clears every queue.

For sentence-split TTS, queue item aliases are suffixed as alias-0, alias-1, and so on. Raw play() and direct streaming TTS use the alias exactly.

TTS Credentials And Vendor Parameters

Direct TTS Vendor Parameters

Pass vendor-native credentials and settings directly through PlayOptions.ttsConfig. These values are forwarded to the TTS connector as-is and override any defaults or platform-resolved credentials.

await channel.audio.say('Hello!', {
  ttsVendor: 'elevenlabs',
  ttsStrategy: 'streaming',
  ttsConfig: {
    api_key: 'sk_your-elevenlabs-key',
    voice_id: 'bBLRWT6MSWBFAm76ZWXY',
    model_id: 'eleven_turbo_v2_5',
    base_url: 'https://api.eu.residency.elevenlabs.io',
  },
});

Each TTS vendor connector accepts its native parameter names:

Vendor	Accepted `ttsConfig` keys
ElevenLabs	`api_key` (or `xi_api_key`), `voice_id`, `model_id` (or `model`), `base_url`, `output_format`, `language_code`, `voice_settings_stability`, `voice_settings_similarity_boost`, `voice_settings_style`, `voice_settings_speed`
Voctiv	`url`, `voice_id`, `language`, `emotion`, `speaking_rate`, `chunk_schedule`

All vendors also accept the env-style names (ELEVENLABS_API_KEY, ELEVENLABS_VOICE_ID, etc.) for backwards compatibility, but vendor-native names are checked first and are preferred.

Voctiv Platform TTS Key Selection

In Voctiv legacy compatibility mode, TTS credentials can also be selected by PlayOptions.name or ttsConfig.name.

await channel.audio.say('Здравствуйте!', {
  name: 'main-tts-key',
  ttsConfig: {
    voice: 'alena',
  },
});

The runtime looks in channel.params.authentication_data.legacyTtsKeysByName[name]. If name is omitted, channel.params.defaultTtsName may be used.

When both name (platform key) and explicit ttsConfig values are provided, ttsConfig values win — they are applied last and override anything resolved from the platform.

cache enables TTS result caching for say() and presay():

cache: true — read/write TTS file cache only (Redis + filesystem + DB).
cache: { phraseName, flag?, language? } — TTS cache plus persist into Voctiv platform record_phrase / record_phrase_file so platform.getRecords() can retrieve the audio later.

// Cache only (no platform persist):
await channel.audio.say('Hello!', { cache: true });

// Cache + persist to platform phrase storage:
await channel.audio.say('Welcome back.', {
  cache: {
    phraseName: 'welcome_back',
    flag: context.flag,
    language: context.language,
  },
});

const records = await platform.getRecords?.({
  phraseName: 'welcome_back',
  flag: context.flag,
  language: context.language,
});

if (records?.[0]) {
  await channel.audio.play(records[0]);
}

This requires legacy compatibility mode, a trusted LE agent id/UUID, TTS cache, and LEGACY_V3_RECORD_PHRASE_ROOT.

Error Handling

ASR and TTS errors are propagated to the script. Unhandled errors are always logged server-side, but scripts can catch them to react: fall back to a different vendor, notify the caller, or abort the dialog.

TTS Errors — Promise Rejection

say() and play() reject their promises when TTS/playback fails:

try {
  await channel.audio.say('Hello!', {
    ttsVendor: 'elevenlabs',
    ttsConfig: { api_key: 'invalid-key', voice_id: 'abc' },
  });
} catch (err) {
  logger.error('TTS failed', { error: String(err) });
  await channel.audio.say('Fallback message.'); // try default TTS
}

ASR Errors — `error$` Observable

Runtime ASR errors (gRPC disconnect, auth failure, quota exceeded) are emitted on AsrHandle.error$:

const asr = await channel.createAsr({
  vendor: 'yandex',
  data: { api_key: 'my-key' },
});

asr.error$.subscribe((err) => {
  logger.error('ASR provider error', {
    message: err.message,
    code: err.code,
    vendor: err.vendor,
  });
});

A degraded handle (returned when connector creation itself failed) has an inert error$ that never emits — the creation failure is reported on channel.events.error$ instead.

Channel Error Stream

channel.events.error$ is a unified stream of all media errors — both ASR and TTS:

channel.events.error$.subscribe((err) => {
  logger.warn(`[${err.source}] ${err.message}`, {
    code: err.code,
    vendor: err.vendor,
  });
});

MediaError fields:

Field	Type	Description
`source`	`'asr' \| 'tts' \| 'sip' \| 'channel'`	Which subsystem produced the error.
`message`	`string`	Human-readable description.
`code`	`number?`	HTTP status, gRPC status, or WebSocket close code.
`vendor`	`string?`	Vendor identifier, e.g. `"yandex"`, `"elevenlabs"`, `"azure"`.
`details`	`unknown?`	Arbitrary provider-specific payload.

Subscribing to error$ is optional. Old scripts that do not subscribe are not affected — the observables simply go unobserved.

LLM API

channel.llm talks to the Omni LLM backend.

const answer = await channel.llm.ask('Summarize the user request', {
  role: 'assistant',
  hidden: true,
  agentUuid: context.agentUuid,
});

await channel.audio.say(answer);

For streaming:

const stream$ = channel.llm.stream('Answer briefly', {
  role: 'assistant',
});

await channel.audio.say(
  stream$.pipe(map((chunk) => chunk.content)),
  { ttsStrategy: 'streaming' },
);

channel.llm.extract(options?) runs structured extraction via Omni. makePersistentStream(options?) opens a long-lived Socket.IO stream and lets you send multiple turns without reconnecting.

Platform API

platform exposes Voctiv platform operations.

platform.nlu.extract(utterance, options?) calls NLU v3 /infer. The runtime sends phrase, context, and agent_id. If options.context is omitted, current dialog params are serialized and used as NLU context.

const result = await platform.nlu.extract('I want to reschedule', {
  intents: ['reschedule', 'cancel'],
  entities: ['date', 'time'],
  use_synonyms: true,
});

Platform APIs require context.legacyV3Compat === true. This includes NLU, outbound calls, dialog writes, messaging sends, and phrase records.

Dialog State

platform.dialog.entryPoint = 'on_recall';
platform.dialog.result = 'done';

Setters update the local value immediately and ask the platform DB to persist asynchronously. They are not awaitable and should not be used as transactional writes.

Platform-Scheduled Calls

Use platform.call(msisdn, options?) when the script needs to ask the platform to place an outbound SIP call. This is different from channel.sip.makeCall(): makeCall() creates a B-leg immediately inside the current live SIP session, while platform.call() schedules a separate platform-managed call that may happen now or later.

The destination number should be E.164 formatted.

await platform.call('+12025551234');

By default, the platform schedules the call for immediate processing. Use date to schedule it for the future:

await platform.call('+12025551234', {
  date: new Date(Date.now() + 15 * 60_000),
  entryPoint: 'on_callback',
});

entryPoint is passed to the script when the scheduled call starts. Use it to route the callback into a specific branch:

export default defineScript(async ({ context, channel }) => {
  if (context.entryPoint === 'on_callback') {
    channel.sip.answer();
    await channel.audio.say('Hello, this is your scheduled callback.');
    return;
  }

  await channel.audio.say('I will call you back in fifteen minutes.');
});

Use dateEnd to define the latest time when the call is still useful. If the platform cannot place the call before that deadline, it can skip the attempt.

await platform.call('+12025551234', {
  date: new Date('2026-04-29T10:00:00Z'),
  dateEnd: new Date('2026-04-29T10:30:00Z'),
  entryPoint: 'on_reminder',
});

Retries are controlled with recallCount and recallDelay:

await platform.call('+12025551234', {
  entryPoint: 'on_follow_up',
  recallCount: 3,
  recallDelay: 300,
});

This means the platform may retry up to three times, waiting about five minutes between attempts.

Other scheduling options:

priority: higher-priority calls can be processed earlier by the dialer.
timezone: timezone offset used by the platform when interpreting scheduled dates.
onSuccessCall: entry point to use after a successful call.
onFailedCall: entry point to use after failed attempts.
protoAdditional: extra protocol-level parameters, such as SIP headers expected by your telephony setup.

Messaging

await platform.messaging.send({
  src: 'bot',
  destination: '+12025551234',
  text: 'Your appointment is confirmed.',
});

Outbound messages are transported through legacy Redis streams. platform.messaging.message$ currently replays the inbound message that started a headless messaging script; it is not a live subscription to all future Redis messages.

Offline / Headless Logic

Offline, or headless, sessions run a script without a live SIP call, WebSocket audio stream, RTP pipeline, ASR, or TTS playback. They are used for platform-driven background logic, queued dialog processing, and messaging events.

The script entry point is still the same defineScript() handler. Detect this mode with context.headless:

export default defineScript(async ({ channel, context, logger, platform }) => {
  if (!context.headless) {
    channel.sip.answer();
    await channel.audio.say('Hello.');
    return;
  }

  logger.log('Running offline logic', {
    dialogUuid: context.dialogUuid,
    entryPoint: context.entryPoint,
  });

  // Offline logic usually works with text, params, env, NLU, LLM, and platform APIs.
});

Headless sessions can be started by host integrations such as:

a platform dialog queue worker that loads pending dialogs;
an inbound messaging worker, usually with context.entryPoint === 'on_message_api_received';
an HTTP/API request that asks ScriptEngine to run a script without media.

What Works In Headless

These APIs are available and are the intended tools for offline scripts:

context.dialogParams, context.initialData, context.dialogEntity, and context.callEntity for platform data.
context.env$ for persisted per-dialog state.
platform.nlu.extract() for text NLU when legacy platform compatibility is enabled.
platform.messaging.send() for outbound messages through the configured platform messaging transport.
platform.call() for scheduling outbound platform-managed calls.
platform.dialog.entryPoint and platform.dialog.result for updating dialog routing and outcome.
channel.llm.ask(), channel.llm.stream(), and channel.llm.extract() for Omni LLM operations.
logger for structured logs.

Audio and telephony APIs are intentionally inert:

channel.audio.say(), play(), preload(), and presay() do not play audio and only log warnings.
channel.createAsr() returns an inert handle with empty observables.
channel.textInput does not simulate ASR in headless mode.
channel.sip.state behaves as an already-active synthetic channel, but real SIP actions such as nested calls and bridging are not available.

Use headless mode for text and platform workflows. Use SIP or WS sessions when the script needs real audio, ASR, TTS, DTMF, pre-answer media, or bridging.

Inbound Messaging

When a headless script is triggered by an inbound message, the runtime exposes the message as context.inboundMessage and also replays it on platform.messaging.message$.

export default defineScript(async ({ context, platform, logger }) => {
  const inbound = context.inboundMessage;
  const payload = inbound?.payload ?? {};

  const text =
    typeof payload.text === 'string'
      ? payload.text
      : typeof payload.message === 'string'
        ? payload.message
        : '';

  logger.log('Inbound message received', {
    src: inbound?.src,
    dst: inbound?.dst,
    channelType: inbound?.channelType,
    text,
  });

  if (!text.trim()) {
    return { output: { reason: 'empty_message' } };
  }

  const nlu = await platform.nlu.extract(text, {
    intents: ['support_request', 'callback_request'],
  });

  await platform.messaging.send({
    src: inbound?.dst ?? 'bot',
    destination: inbound?.src ?? '',
    text: 'Thanks, I received your message.',
  });

  return {
    output: {
      text,
      nlu,
    },
  };
});

context.inboundMessage.payload is the raw transport payload. Different messaging providers may use different field names (text, message, body, content, etc.), so production scripts should normalize the text they need.

Persisting Offline State

Use context.env$ to keep state between offline runs for the same dialog:

const env = context.env$?.getValue() ?? {};
const messageCount = Number(env.messageCount ?? 0) + 1;

context.env$?.next({
  ...env,
  messageCount,
  lastMessageAt: new Date().toISOString(),
});

Do not return env from the script. ScriptEngine snapshots context.env$ after completion and persists it according to the host integration.

Combining Voice And Offline In One Script

One script can support both live calls and offline messages by branching on context.headless:

export default defineScript(async ({ channel, context, platform }) => {
  if (context.headless) {
    const text = String(context.inboundMessage?.payload?.text ?? '');

    if (text.includes('call me')) {
      await platform.call(context.msisdn, {
        entryPoint: 'on_callback',
      });
    }

    return { output: { handledOffline: true } };
  }

  channel.sip.answer();
  await channel.audio.say('How can I help you?');
});

Dialog Context And Persisted Env

context includes identity, telephony fields, params, routing metadata, and runtime helpers.

Important fields:

context.dialogUuid: current dialog UUID.
context.callerId / context.msisdn: caller identity.
context.destinationNumber: called number.
context.language / context.lang: language selected for the run.
context.flag: business flag.
context.initialData: shallow snapshot of params at script start.
context.dialogParams: live param map for the run.
context.entryPoint: current routing entry point.
context.headless: true for offline/queue/messaging sessions without a real media channel.
context.runTime: async execution budget helper.
context.env$: persisted dialog environment as an RxJS BehaviorSubject.

Use env$ for persisted script state:

const current = context.env$?.getValue() ?? {};
context.env$?.next({
  ...current,
  lastIntent: 'reschedule',
});

Do not return env from the script. The runtime snapshots context.env$ after completion and attaches it to the persisted result.

Logging And Debugging

Use logger.log(), warn(), error(), and debug() for structured logs.

logger.log('ASR result received', { text });
logger.warn('Low confidence intent', { confidence });

logger.enableDebug(endpoint) streams logs from the current script instance to a remote debug endpoint. logger.breakpoint(label, snapshot?) pauses only when an active debug session is connected; otherwise it resolves immediately.

WS And Headless Behavior

WS channels behave like active media channels:

channel.sip.state is effectively active.
sendDtmf() emits dtmf-send to the WS client.
sendMessage() emits a structured data event.
ASR reads socket audio frames or synthetic text input.

Headless channels are for offline, queue, or messaging sessions:

Audio methods are no-ops that log warnings.
SIP methods are mostly no-ops.
createAsr() returns an inert handle.
LLM, NLU, messaging, platform calls, dialog state, and env$ still work.

Use context.headless to branch when a script must behave differently without a real media channel. See Offline / Headless Logic for details and examples.

Text Input For Tests

channel.textInput injects synthetic ASR output into a live ASR handle.

const asr = await channel.createAsr();

channel.textInput.pushPartial(asr.id, 'hello', false);
channel.textInput.pushResult(asr.id, 'hello world');

This is mainly for WS debug clients and automated tests. Unknown ASR ids are ignored.

Package Notes

The package is published as CommonJS with TypeScript declarations in dist.

Build locally with:

npm run build

The package exports only the public SDK entry point:

import { defineScript, type MediaChannel, type AsrHandle, type MediaError } from '@voctiv/agent-sdk';

@voctiv/agent-sdk

Package Exports

Readme

@voctiv/agent-sdk

Installation

Basic Script

What The SDK Contains

SIP And Pre-Answer Media

How Pre-Answer Works

Outbound Pre-Answer

Inbound Pre-Answer

Audio Auto-Wait

SIP Controls

ASR, VAD, And Smart Turn

ASR Credentials And Vendors

Direct ASR Vendor Parameters

Voctiv Platform ASR Key Selection

TTS, Playback, And Mixer Queues

TTS Strategies

Mixer Queues

TTS Credentials And Vendor Parameters

Direct TTS Vendor Parameters

Voctiv Platform TTS Key Selection

Error Handling

TTS Errors — Promise Rejection

ASR Errors — error$ Observable

Channel Error Stream

LLM API

Platform API

Dialog State

Platform-Scheduled Calls

Messaging

Offline / Headless Logic

What Works In Headless

Inbound Messaging

Persisting Offline State

Combining Voice And Offline In One Script

Dialog Context And Persisted Env

Logging And Debugging

WS And Headless Behavior

Text Input For Tests

Package Notes

ASR Errors — `error$` Observable