JSPM

subtitle-forge

0.1.0
  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 2
  • Score
    100M100P100Q60811F
  • License MIT

Generate and translate timestamped subtitles from timed transcripts while preserving cue timing.

Package Exports

  • subtitle-forge
  • subtitle-forge/languages
  • subtitle-forge/library
  • subtitle-forge/llm
  • subtitle-forge/package.json
  • subtitle-forge/srt
  • subtitle-forge/transcript

Readme

subtitle-forge

Side-effect-free subtitle generation and translation for AI agents, cloud functions, and background workers.

subtitle-forge implements this reusable pipeline:

timed transcript -> subtitle cues -> SRT/WebVTT -> translated subtitles with preserved timing

It does not submit ASR jobs, read or write files, run FFmpeg, mux videos, or manage local output folders. Those responsibilities belong to your app, CLI, or worker. This package only accepts JSON/string inputs and returns JSON/string outputs.

中文文档见 README.zh-CN.md.

Install

npm install subtitle-forge

Requirements:

  • Node.js 22 or newer.
  • ESM import syntax.
  • fetch available globally, or pass a custom fetch implementation to LlmTranslator.

When To Use This Package

Use it when you already have one of these inputs:

  • A provider-neutral TimedTranscript with word-level timestamps.
  • A Speechmatics json-v2 transcript.
  • Existing SRT text that should be translated while preserving cue indexes and timestamps.

Do not use it as a full video pipeline. It intentionally does not extract audio, call ASR providers, burn subtitles, or upload media files.

AI Agent Usage Contract

If you are an AI coding agent, follow this contract:

  1. Convert ASR output into TimedTranscript:
    • words must contain { text, start, end } in seconds.
    • start and end must be finite numbers.
    • Preserve speaker when available.
    • Set delimiterBefore: "" on the first word of a sentence/segment if the ASR output already models spacing.
    • Set eos: true when a word ends a sentence; this helps cue splitting.
  2. Call buildSourceSubtitles() when only source SRT/VTT is needed.
  3. Call translateTimedTranscript() when you need source subtitles and translated subtitles from timed words.
  4. Call translateSrtText() when you already have SRT and only need translation.
  5. Store translation.items externally for resumable jobs, then pass it back as existingItems.
  6. Never ask the LLM to change cue indexes or timestamps. This package preserves timing by replacing text by cue index.
  7. If TimedTranscript.words is empty, fail early or run ASR/alignment first. Plain text without timestamps cannot produce reliable SRT timing.

Quick Start: Timed Transcript To Translated SRT

import { translateTimedTranscript } from "subtitle-forge";

const result = await translateTimedTranscript({
  transcript: {
    provider: "your-asr",
    language: "en",
    words: [
      { text: "Hello", start: 0.0, end: 0.2, delimiterBefore: "" },
      { text: "world.", start: 0.25, end: 0.7, eos: true },
      { text: "Let's", start: 1.4, end: 1.7, delimiterBefore: "" },
      { text: "begin.", start: 1.75, end: 2.1, eos: true },
    ],
  },
  llm: {
    apiKey: process.env.LLM_API_KEY,
    baseUrl: process.env.LLM_BASE_URL ?? "https://api.openai.com/v1",
    model: process.env.LLM_MODEL,
  },
  sourceLanguage: "English",
  targetLanguage: "Simplified Chinese",
  subtitleFormats: ["srt", "vtt"],
});

console.log(result.source.srt);
console.log(result.translation.srt);
console.log(result.translation.vtt);
console.log(result.translation.items);

Quick Start: Existing SRT To Translated SRT

import { translateSrtText } from "subtitle-forge";

const result = await translateSrtText({
  srt: `1
00:00:00,000 --> 00:00:01,000
Hello, world.
`,
  llm: {
    apiKey: process.env.LLM_API_KEY,
    model: process.env.LLM_MODEL,
  },
  sourceLanguage: "English",
  targetLanguage: "Simplified Chinese",
});

console.log(result.srt);

Resumable Translation

translateTimedTranscript() and translateSrtText() both accept existingItems and onProgress.

import { translateSrtText, type TranslationItem } from "subtitle-forge";

const existingItems: TranslationItem[] = await loadCheckpointFromDatabase(jobId);

const result = await translateSrtText({
  srt,
  llm,
  targetLanguage: "Japanese",
  existingItems,
  async onProgress(items, progress) {
    await saveCheckpointToDatabase(jobId, {
      completed: progress.completed,
      total: progress.total,
      items,
    });
  },
});

await saveFinalSubtitle(jobId, result.srt);

The checkpoint format is simply an array of:

type TranslationItem = {
  index: number;
  text: string;
};

The package ignores checkpoint items that do not match the current cue indexes.

Custom Translator

You can avoid the built-in OpenAI-compatible client by injecting a translator. This is useful for tests, internal gateways, queues, or non-OpenAI providers.

import {
  translateSrtText,
  type SubtitleCueTranslator,
} from "subtitle-forge";

const translator: SubtitleCueTranslator = {
  async translateCues({ cues }) {
    return cues.map((cue) => ({
      index: cue.index,
      text: `translated: ${cue.text}`,
    }));
  },
};

const result = await translateSrtText({
  srt,
  translator,
});

A custom translator must return one non-empty TranslationItem for every input cue index that is not already covered by existingItems.

Cloud Function Pattern

import { translateTimedTranscript } from "subtitle-forge";

export async function handleSubtitleJob(request: {
  jobId: string;
  transcript: unknown;
  existingItems?: Array<{ index: number; text: string }>;
}) {
  const result = await translateTimedTranscript({
    transcript: request.transcript as any,
    llm: {
      apiKey: process.env.LLM_API_KEY,
      baseUrl: process.env.LLM_BASE_URL,
      model: process.env.LLM_MODEL,
    },
    sourceLanguage: "auto",
    targetLanguage: "Simplified Chinese",
    existingItems: request.existingItems,
    subtitleFormats: ["srt", "vtt"],
    async onProgress(items, progress) {
      await saveJobState(request.jobId, { ...progress, items });
    },
  });

  await saveObject(`${request.jobId}/source.srt`, result.source.srt);
  await saveObject(`${request.jobId}/translation.zh.srt`, result.translation.srt);

  return {
    sourceCueCount: result.source.cues.length,
    translatedCueCount: result.translation.items.length,
  };
}

Public API

translateTimedTranscript(options)

Builds source subtitles from a timed transcript, translates the generated source SRT, and returns both source and translated subtitle artifacts.

function translateTimedTranscript(
  options: TranslateTimedTranscriptOptions,
): Promise<TranslateTimedTranscriptResult>;

Important options:

Option Type Default Description
transcript TimedTranscript Required Provider-neutral timed words.
translator SubtitleCueTranslator Optional Custom translation provider.
llm LlmTranslatorOptions Optional Built-in OpenAI-compatible translator config. Required when translator is absent.
sourceLanguage string "auto" Source language name or code for translation prompt.
targetLanguage string "Simplified Chinese" Target language name or code.
cueOptions CueOptions See below Controls timed-word to cue segmentation.
subtitleOptions SubtitleTextOptions { maxLineLength: 37, maxLines: 2 } Controls wrapping when rendering SRT/VTT.
subtitleFormats SubtitleFormat[] ["srt"] Include "vtt" to return WebVTT.
batchSize number 30 Subtitle cues per LLM request.
contextWindow number 8 Nearby cues sent before/after each batch as context.
existingItems TranslationItem[] [] Checkpoint items to skip already translated cue indexes.
onProgress callback Optional Called after each translated batch.

Return shape:

type TranslateTimedTranscriptResult = {
  source: {
    transcriptText: string;
    cues: SubtitleSegment[];
    srt: string;
    vtt?: string;
  };
  translation: {
    sourceCues: SrtCue[];
    items: TranslationItem[];
    cues: SrtCue[];
    srt: string;
    vtt?: string;
  };
};

buildSourceSubtitles(options)

Builds source-language subtitles only. No LLM calls.

function buildSourceSubtitles(options: {
  transcript: TimedTranscript;
  cueOptions?: CueOptions;
  subtitleOptions?: SubtitleTextOptions;
  subtitleFormats?: SubtitleFormat[];
}): SourceSubtitleResult;

translateSrtText(options)

Translates existing SRT text while preserving cue indexes and timestamps.

function translateSrtText(
  options: TranslateSrtTextOptions,
): Promise<TranslatedSubtitleResult>;

Use this when your ASR or subtitle editor already produced SRT.

LlmTranslator

OpenAI-compatible Chat Completions translator.

const translator = new LlmTranslator({
  apiKey: process.env.LLM_API_KEY,
  baseUrl: "https://api.openai.com/v1",
  model: "your-model",
  temperature: 0.2,
  thinking: "disabled",
  reasoningEffort: "low",
  fetch: customFetch,
});

It calls:

POST {baseUrl}/chat/completions

Expected response shape:

{
  "choices": [
    {
      "message": {
        "content": "{\"items\":[{\"index\":1,\"text\":\"...\"}]}"
      }
    }
  ]
}

The built-in translator:

  • Sends only cue index and text; it does not send timestamps.
  • Sends context_before and context_after for continuity.
  • Requires valid returned indexes.
  • Repairs a few common malformed JSON issues.
  • Splits a failed batch into smaller batches and retries.

Data Types

TimedTranscript

type TimedTranscript = {
  provider?: string;
  language?: string;
  text?: string;
  words: TimedWord[];
  raw?: unknown;
};

TimedWord

type TimedWord = {
  text: string;
  start: number;
  end: number;
  speaker?: string;
  delimiterBefore?: string;
  eos?: boolean;
};

Rules:

  • start and end are seconds.
  • text can include punctuation.
  • delimiterBefore defaults to a space when words are joined.
  • Use delimiterBefore: "" to avoid inserting a space before a word.
  • eos means end of sentence and helps cue splitting.
  • Speaker changes force cue boundaries when both neighboring words have different speaker values.

CueOptions

type CueOptions = {
  maxDuration?: number;     // default 4.2 seconds
  targetDuration?: number;  // default 2.8 seconds
  maxChars?: number;        // default 54
  maxWords?: number;        // default 12
  pauseThreshold?: number;  // default 0.55 seconds
  minDuration?: number;     // default 0.45 seconds
  startPadding?: number;    // default 0.08 seconds
  endPadding?: number;      // default 0.16 seconds
  nextCueGap?: number;      // default 0.05 seconds
};

SubtitleSegment

type SubtitleSegment = {
  index?: number;
  start_time: number;
  end_time: number;
  content: string;
  speaker?: string;
};

SrtCue

type SrtCue = {
  index: number;
  start: string; // HH:MM:SS,mmm
  end: string;   // HH:MM:SS,mmm
  text: string;
};

TranslationItem

type TranslationItem = {
  index: number;
  text: string;
};

Lower-Level Utilities

import {
  cleanSubtitleText,
  cuesToSrt,
  cuesToVtt,
  formatSrtTimestamp,
  formatVttTimestamp,
  parseSrt,
  parseSubtitleFormats,
  replaceCueText,
  segmentsToSrt,
  segmentsToVtt,
  timedTranscriptToPlainText,
  timedTranscriptToWordCues,
  timedWordsToCues,
  wrapSubtitleText,
} from "subtitle-forge";

Common uses:

  • timedWordsToCues(words, cueOptions) converts word timestamps to subtitle segments.
  • segmentsToSrt(segments, subtitleOptions) renders local subtitle segments as SRT.
  • parseSrt(srt) parses SRT into cues.
  • replaceCueText(cues, items) preserves timing and replaces only cue text.
  • cuesToVtt(cues, subtitleOptions) renders WebVTT.

Speechmatics Helper

For Speechmatics json-v2, use:

import {
  speechmaticsTranscriptToTimedTranscript,
  transcriptJsonToPlainText,
  transcriptJsonToTimedWords,
} from "subtitle-forge";

const timedTranscript = speechmaticsTranscriptToTimedTranscript(jsonV2);

Other ASR providers should be mapped into TimedTranscript by your app.

Error Behavior

The package throws when:

  • No timed words are available to build subtitles.
  • SRT input contains no parseable cues.
  • Neither translator nor llm is provided for translation.
  • The LLM response omits one or more requested cue indexes.
  • The LLM response cannot be parsed as JSON after basic repair attempts.
  • The OpenAI-compatible endpoint returns a non-2xx response.

Import Paths

import { translateTimedTranscript } from "subtitle-forge";
import { parseSrt } from "subtitle-forge/srt";
import { LlmTranslator } from "subtitle-forge/llm";
import { timedWordsToCues } from "subtitle-forge/transcript";

Prefer the root import unless you specifically want a smaller submodule import.

Publishing Checklist

For maintainers:

npm test
npm pack --dry-run -w subtitle-forge
npm publish -w subtitle-forge --access public

The package is scoped and sets publishConfig.access to public.