Package Exports
- @vexaai/transcript-rendering
Readme
Transcript Rendering
Why
Real-time transcript WebSocket streams produce overlapping, out-of-order, duplicate segments. Multiple speakers talk simultaneously, ASR engines emit draft-then-confirmed rewrites, and network jitter delivers segments out of order. Without a processing pipeline, rendering this raw data produces garbled, duplicated text.
What
This library transforms raw TranscriptSegment[] streams into clean, speaker-grouped SegmentGroup[] output ready for rendering.
Data Flow
WebSocket / REST segments
│
▼
upsertSegments() merge into Map, handle draft→confirmed
│
▼
sortSegments() order by absolute_start_time
│
▼
deduplicateSegments() remove overlaps, expansions, tail-repeats (per-speaker)
│
▼
groupSegments() consecutive same-speaker segments → SegmentGroup[]
│
▼
SegmentGroup[] ready to renderExports
| Export | Signature | Description |
|---|---|---|
upsertSegments |
(existing: Map<string, T>, incoming: T[]) => Map<string, T> |
Merge incoming segments into a map; handles draft→confirmed transitions |
sortSegments |
(segments: T[]) => T[] |
Sort segments by absolute_start_time (ISO string comparison) |
deduplicateSegments |
(segments: T[]) => T[] |
Speaker-aware dedup: adjacent duplicates, containment, expansion, tail-repeats |
groupSegments |
(segments: T[], options?: GroupingOptions) => SegmentGroup<T>[] |
Group consecutive same-key segments; splits at maxCharsPerGroup boundaries |
parseUTCTimestamp |
(timestamp: string) => Date |
Parse ISO timestamps as UTC (appends Z when no timezone suffix) |
TranscriptSegment |
type | Input segment interface |
SegmentGroup |
type | Output grouped segments |
GroupingOptions |
type | Grouping configuration |
TranscriptSegment Fields
| Field | Type | Description |
|---|---|---|
text |
string |
Segment text content |
speaker |
string? |
Speaker name or identifier |
absolute_start_time |
string |
ISO timestamp of segment start |
absolute_end_time |
string |
ISO timestamp of segment end |
completed |
boolean? |
Whether the segment is finalized (vs. draft) |
segment_id |
string? |
Stable identity (e.g., speakerA:3) |
start_time |
number? |
Relative start time in seconds |
end_time |
number? |
Relative end time in seconds |
updated_at |
string? |
ISO timestamp of last update |
GroupingOptions
| Option | Type | Default | Description |
|---|---|---|---|
getGroupKey |
(segment: TranscriptSegment) => string |
Groups by speaker |
Returns the grouping key for a segment |
maxCharsPerGroup |
number |
512 |
Maximum characters per group before splitting at segment boundaries |
How
Install & Build
cd packages/transcript-rendering
npm install
npm run build # Build with tsup (ESM + CJS)
npm test # Run tests with vitest
npm run typecheck # Type-check without emittingUsage
import {
upsertSegments,
sortSegments,
deduplicateSegments,
groupSegments,
type TranscriptSegment,
} from '@vexaai/transcript-rendering';
// Maintain a segment map across WebSocket messages
const segments = new Map<string, TranscriptSegment>();
ws.on('message', (data) => {
const incoming: TranscriptSegment[] = JSON.parse(data);
// Full pipeline: upsert → sort → dedup → group
upsertSegments(segments, incoming);
const sorted = sortSegments([...segments.values()]);
const deduped = deduplicateSegments(sorted);
const groups = groupSegments(deduped);
// Each group has: key (speaker), combinedText, startTime, endTime, segments[]
render(groups);
});Package
Published as @vexaai/transcript-rendering. Dual ESM/CJS output via tsup. Apache-2.0 license.