Package Exports
- subtitle-forge
- subtitle-forge/languages
- subtitle-forge/library
- subtitle-forge/llm
- subtitle-forge/package.json
- subtitle-forge/srt
- subtitle-forge/transcript
Readme
subtitle-forge
Side-effect-free subtitle generation and translation for AI agents, cloud functions, and background workers.
subtitle-forge implements this reusable pipeline:
timed transcript -> subtitle cues -> SRT/WebVTT -> translated subtitles with preserved timingIt does not submit ASR jobs, read or write files, run FFmpeg, mux videos, or manage local output folders. Those responsibilities belong to your app, CLI, or worker. This package only accepts JSON/string inputs and returns JSON/string outputs.
中文文档见 README.zh-CN.md.
Install
npm install subtitle-forgeRequirements:
- Node.js 22 or newer.
- ESM import syntax.
fetchavailable globally, or pass a customfetchimplementation toLlmTranslator.
When To Use This Package
Use it when you already have one of these inputs:
- A provider-neutral
TimedTranscriptwith word-level timestamps. - A Speechmatics
json-v2transcript. - Existing SRT text that should be translated while preserving cue indexes and timestamps.
Do not use it as a full video pipeline. It intentionally does not extract audio, call ASR providers, burn subtitles, or upload media files.
AI Agent Usage Contract
If you are an AI coding agent, follow this contract:
- Convert ASR output into
TimedTranscript:wordsmust contain{ text, start, end }in seconds.startandendmust be finite numbers.- Preserve
speakerwhen available. - Set
delimiterBefore: ""on the first word of a sentence/segment if the ASR output already models spacing. - Set
eos: truewhen a word ends a sentence; this helps cue splitting.
- Call
buildSourceSubtitles()when only source SRT/VTT is needed. - Call
translateTimedTranscript()when you need source subtitles and translated subtitles from timed words. - Call
translateSrtText()when you already have SRT and only need translation. - Store
translation.itemsexternally for resumable jobs, then pass it back asexistingItems. - Never ask the LLM to change cue indexes or timestamps. This package preserves timing by replacing text by cue
index. - If
TimedTranscript.wordsis empty, fail early or run ASR/alignment first. Plain text without timestamps cannot produce reliable SRT timing.
Quick Start: Timed Transcript To Translated SRT
import { translateTimedTranscript } from "subtitle-forge";
const result = await translateTimedTranscript({
transcript: {
provider: "your-asr",
language: "en",
words: [
{ text: "Hello", start: 0.0, end: 0.2, delimiterBefore: "" },
{ text: "world.", start: 0.25, end: 0.7, eos: true },
{ text: "Let's", start: 1.4, end: 1.7, delimiterBefore: "" },
{ text: "begin.", start: 1.75, end: 2.1, eos: true },
],
},
llm: {
apiKey: process.env.LLM_API_KEY,
baseUrl: process.env.LLM_BASE_URL ?? "https://api.openai.com/v1",
model: process.env.LLM_MODEL,
},
sourceLanguage: "English",
targetLanguage: "Simplified Chinese",
subtitleFormats: ["srt", "vtt"],
});
console.log(result.source.srt);
console.log(result.translation.srt);
console.log(result.translation.vtt);
console.log(result.translation.items);Quick Start: Existing SRT To Translated SRT
import { translateSrtText } from "subtitle-forge";
const result = await translateSrtText({
srt: `1
00:00:00,000 --> 00:00:01,000
Hello, world.
`,
llm: {
apiKey: process.env.LLM_API_KEY,
model: process.env.LLM_MODEL,
},
sourceLanguage: "English",
targetLanguage: "Simplified Chinese",
});
console.log(result.srt);Resumable Translation
translateTimedTranscript() and translateSrtText() both accept existingItems and onProgress.
import { translateSrtText, type TranslationItem } from "subtitle-forge";
const existingItems: TranslationItem[] = await loadCheckpointFromDatabase(jobId);
const result = await translateSrtText({
srt,
llm,
targetLanguage: "Japanese",
existingItems,
async onProgress(items, progress) {
await saveCheckpointToDatabase(jobId, {
completed: progress.completed,
total: progress.total,
items,
});
},
});
await saveFinalSubtitle(jobId, result.srt);The checkpoint format is simply an array of:
type TranslationItem = {
index: number;
text: string;
};The package ignores checkpoint items that do not match the current cue indexes.
Custom Translator
You can avoid the built-in OpenAI-compatible client by injecting a translator. This is useful for tests, internal gateways, queues, or non-OpenAI providers.
import {
translateSrtText,
type SubtitleCueTranslator,
} from "subtitle-forge";
const translator: SubtitleCueTranslator = {
async translateCues({ cues }) {
return cues.map((cue) => ({
index: cue.index,
text: `translated: ${cue.text}`,
}));
},
};
const result = await translateSrtText({
srt,
translator,
});A custom translator must return one non-empty TranslationItem for every input cue index that is not already covered by existingItems.
Cloud Function Pattern
import { translateTimedTranscript } from "subtitle-forge";
export async function handleSubtitleJob(request: {
jobId: string;
transcript: unknown;
existingItems?: Array<{ index: number; text: string }>;
}) {
const result = await translateTimedTranscript({
transcript: request.transcript as any,
llm: {
apiKey: process.env.LLM_API_KEY,
baseUrl: process.env.LLM_BASE_URL,
model: process.env.LLM_MODEL,
},
sourceLanguage: "auto",
targetLanguage: "Simplified Chinese",
existingItems: request.existingItems,
subtitleFormats: ["srt", "vtt"],
async onProgress(items, progress) {
await saveJobState(request.jobId, { ...progress, items });
},
});
await saveObject(`${request.jobId}/source.srt`, result.source.srt);
await saveObject(`${request.jobId}/translation.zh.srt`, result.translation.srt);
return {
sourceCueCount: result.source.cues.length,
translatedCueCount: result.translation.items.length,
};
}Public API
translateTimedTranscript(options)
Builds source subtitles from a timed transcript, translates the generated source SRT, and returns both source and translated subtitle artifacts.
function translateTimedTranscript(
options: TranslateTimedTranscriptOptions,
): Promise<TranslateTimedTranscriptResult>;Important options:
| Option | Type | Default | Description |
|---|---|---|---|
transcript |
TimedTranscript |
Required | Provider-neutral timed words. |
translator |
SubtitleCueTranslator |
Optional | Custom translation provider. |
llm |
LlmTranslatorOptions |
Optional | Built-in OpenAI-compatible translator config. Required when translator is absent. |
sourceLanguage |
string |
"auto" |
Source language name or code for translation prompt. |
targetLanguage |
string |
"Simplified Chinese" |
Target language name or code. |
cueOptions |
CueOptions |
See below | Controls timed-word to cue segmentation. |
subtitleOptions |
SubtitleTextOptions |
{ maxLineLength: 37, maxLines: 2 } |
Controls wrapping when rendering SRT/VTT. |
subtitleFormats |
SubtitleFormat[] |
["srt"] |
Include "vtt" to return WebVTT. |
batchSize |
number |
30 |
Subtitle cues per LLM request. |
contextWindow |
number |
8 |
Nearby cues sent before/after each batch as context. |
existingItems |
TranslationItem[] |
[] |
Checkpoint items to skip already translated cue indexes. |
onProgress |
callback | Optional | Called after each translated batch. |
Return shape:
type TranslateTimedTranscriptResult = {
source: {
transcriptText: string;
cues: SubtitleSegment[];
srt: string;
vtt?: string;
};
translation: {
sourceCues: SrtCue[];
items: TranslationItem[];
cues: SrtCue[];
srt: string;
vtt?: string;
};
};buildSourceSubtitles(options)
Builds source-language subtitles only. No LLM calls.
function buildSourceSubtitles(options: {
transcript: TimedTranscript;
cueOptions?: CueOptions;
subtitleOptions?: SubtitleTextOptions;
subtitleFormats?: SubtitleFormat[];
}): SourceSubtitleResult;translateSrtText(options)
Translates existing SRT text while preserving cue indexes and timestamps.
function translateSrtText(
options: TranslateSrtTextOptions,
): Promise<TranslatedSubtitleResult>;Use this when your ASR or subtitle editor already produced SRT.
LlmTranslator
OpenAI-compatible Chat Completions translator.
const translator = new LlmTranslator({
apiKey: process.env.LLM_API_KEY,
baseUrl: "https://api.openai.com/v1",
model: "your-model",
temperature: 0.2,
thinking: "disabled",
reasoningEffort: "low",
fetch: customFetch,
});It calls:
POST {baseUrl}/chat/completionsExpected response shape:
{
"choices": [
{
"message": {
"content": "{\"items\":[{\"index\":1,\"text\":\"...\"}]}"
}
}
]
}The built-in translator:
- Sends only cue
indexandtext; it does not send timestamps. - Sends
context_beforeandcontext_afterfor continuity. - Requires valid returned indexes.
- Repairs a few common malformed JSON issues.
- Splits a failed batch into smaller batches and retries.
Data Types
TimedTranscript
type TimedTranscript = {
provider?: string;
language?: string;
text?: string;
words: TimedWord[];
raw?: unknown;
};TimedWord
type TimedWord = {
text: string;
start: number;
end: number;
speaker?: string;
delimiterBefore?: string;
eos?: boolean;
};Rules:
startandendare seconds.textcan include punctuation.delimiterBeforedefaults to a space when words are joined.- Use
delimiterBefore: ""to avoid inserting a space before a word. eosmeans end of sentence and helps cue splitting.- Speaker changes force cue boundaries when both neighboring words have different
speakervalues.
CueOptions
type CueOptions = {
maxDuration?: number; // default 4.2 seconds
targetDuration?: number; // default 2.8 seconds
maxChars?: number; // default 54
maxWords?: number; // default 12
pauseThreshold?: number; // default 0.55 seconds
minDuration?: number; // default 0.45 seconds
startPadding?: number; // default 0.08 seconds
endPadding?: number; // default 0.16 seconds
nextCueGap?: number; // default 0.05 seconds
};SubtitleSegment
type SubtitleSegment = {
index?: number;
start_time: number;
end_time: number;
content: string;
speaker?: string;
};SrtCue
type SrtCue = {
index: number;
start: string; // HH:MM:SS,mmm
end: string; // HH:MM:SS,mmm
text: string;
};TranslationItem
type TranslationItem = {
index: number;
text: string;
};Lower-Level Utilities
import {
cleanSubtitleText,
cuesToSrt,
cuesToVtt,
formatSrtTimestamp,
formatVttTimestamp,
parseSrt,
parseSubtitleFormats,
replaceCueText,
segmentsToSrt,
segmentsToVtt,
timedTranscriptToPlainText,
timedTranscriptToWordCues,
timedWordsToCues,
wrapSubtitleText,
} from "subtitle-forge";Common uses:
timedWordsToCues(words, cueOptions)converts word timestamps to subtitle segments.segmentsToSrt(segments, subtitleOptions)renders local subtitle segments as SRT.parseSrt(srt)parses SRT into cues.replaceCueText(cues, items)preserves timing and replaces only cue text.cuesToVtt(cues, subtitleOptions)renders WebVTT.
Speechmatics Helper
For Speechmatics json-v2, use:
import {
speechmaticsTranscriptToTimedTranscript,
transcriptJsonToPlainText,
transcriptJsonToTimedWords,
} from "subtitle-forge";
const timedTranscript = speechmaticsTranscriptToTimedTranscript(jsonV2);Other ASR providers should be mapped into TimedTranscript by your app.
Error Behavior
The package throws when:
- No timed words are available to build subtitles.
- SRT input contains no parseable cues.
- Neither
translatornorllmis provided for translation. - The LLM response omits one or more requested cue indexes.
- The LLM response cannot be parsed as JSON after basic repair attempts.
- The OpenAI-compatible endpoint returns a non-2xx response.
Import Paths
import { translateTimedTranscript } from "subtitle-forge";
import { parseSrt } from "subtitle-forge/srt";
import { LlmTranslator } from "subtitle-forge/llm";
import { timedWordsToCues } from "subtitle-forge/transcript";Prefer the root import unless you specifically want a smaller submodule import.
Publishing Checklist
For maintainers:
npm test
npm pack --dry-run -w subtitle-forge
npm publish -w subtitle-forge --access publicThe package is scoped and sets publishConfig.access to public.