Package Exports
- @journalia/sdk
Readme
@journalia/sdk
The Journalia Partner SDK embeds real-time medical transcription and AI-powered clinical note generation into your application.
Quick Start
import { JournaliaClient } from '@journalia/sdk';
const client = new JournaliaClient({
getToken: async () => {
// This is YOUR endpoint. Implement it however makes sense for your stack.
// It must return { token }, a JWT from the Journalia /auth endpoint.
const res = await fetch('/api/journalia/token', { method: 'POST' });
return await res.json();
},
language: { primary: 'nb-NO' },
});
await client.startTranscription();
// Feed audio chunks, receive transcript events
const chunk = await client.getTranscriptionChunk(audioBlob);
if (chunk) {
renderTranscript(chunk);
}
// When the consultation is done
const result = await client.stopTranscription({
noteType: 'soap',
notesContext: 'Female, 45 years. Referred for persistent headaches.',
});
// Render sections (e.g. Subjective, Objective, Assessment, Plan)
result.sections.forEach((s) => console.log(`${s.title}: ${s.content}`));
client.destroy();Installation
npm install @journalia/sdkPrerequisites
You'll need your Journalia partner credentials:
| Credential | Description |
|---|---|
clientId |
Your partner identifier |
clientSecret |
Your partner secret. Keep this on your backend only. |
You'll also choose a scopeKey for session scoping (see Session Scoping).
Integration Guide
Step 1: Backend Token Endpoint
The SDK needs a JWT to authenticate with Journalia. Your backend obtains this by calling Journalia's /auth endpoint with your credentials, then returns the token to your frontend.
Never expose
clientSecretto the browser.
// Express example. Adapt to your framework.
import express from 'express';
const app = express();
app.use(express.json());
app.post('/api/journalia/token', async (req, res) => {
const response = await fetch('https://api.journalia.no/auth', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
clientId: process.env.JOURNALIA_CLIENT_ID,
clientSecret: process.env.JOURNALIA_CLIENT_SECRET,
scopeKey: process.env.JOURNALIA_SCOPE_KEY,
}),
});
const data = await response.json();
if (!response.ok) {
res.status(response.status).json(data);
return;
}
res.json(data);
});Session Scoping
The scopeKey you pass in the /auth request controls session isolation. It is your own identifier, and you decide what it means:
- Set it to a user ID to scope sessions per clinician. Each clinician's sessions are isolated from each other.
- Set it to a clinic or department ID to scope sessions per location. Any clinician within that scope can access sessions created under it.
- Set it to a fixed string to share a single scope across all sessions (not recommended for production).
Sessions created under one scopeKey cannot be accessed from another. Choose a scoping strategy that matches your application's access control model. This is important: if you use a predictable or shared scopeKey, any client with a valid token for that scope could access session data within it.
Step 2: Initialize the Client
The SDK constructor takes a getToken function that it calls whenever it needs a JWT (on session start and when the current token nears expiry). How you implement this function is entirely up to you. It just needs to return { token } from your backend.
Here is one simple example using fetch:
import { JournaliaClient } from '@journalia/sdk';
const client = new JournaliaClient({
getToken: async () => {
const res = await fetch('/api/journalia/token', { method: 'POST' });
if (!res.ok) throw new Error('Token request failed');
return await res.json();
},
language: { primary: 'nb-NO' },
});If your stack uses a different HTTP client, RPC framework, or server-rendered token injection, use whatever approach fits. The SDK does not care how the token is obtained, only that getToken returns a Promise<{ token: string }>.
Step 3: Start Transcription
await client.startTranscription();This authenticates, initializes a session, and opens the transcription WebSocket. You can check the current state at any time with client.getState(), which returns a SessionState string. After calling startTranscription(), the state progresses: idle -> starting -> ready.
You can optionally pass event callbacks to observe WebSocket reconnection:
await client.startTranscription({
onReconnecting: () => console.log('WebSocket reconnecting…'),
onReconnected: (info) => console.log(`Reconnected, replayed ${info.replayedChunks} chunks`),
onReconnectFailed: (error) => console.error('Reconnect failed:', error),
});Step 4: Capture and Stream Audio
The SDK accepts raw PCM audio as Blob objects via getTranscriptionChunk(). Your application is responsible for capturing audio and producing these blobs. The audio requirements are:
| Parameter | Requirement |
|---|---|
| Format | Raw PCM (not compressed formats like WebM or Opus) |
| Encoding | Float32 or 16-bit signed little-endian |
| Sample rate | Any standard rate. The SDK resamples internally. 16 kHz or 48 kHz are typical. |
| Channels | Mono |
| Chunk size | ~50ms of audio per chunk is a good default |
How you produce these chunks depends on your stack. The SDK does not impose any specific capture mechanism.
JavaScript example using AudioWorklet:
If you're running in a browser with JavaScript, AudioWorklet combined with getUserMedia is the standard approach for raw PCM capture. Here is a working example:
// audio-worklet-processor.js
// Register this as an AudioWorklet module.
class PcmProcessor extends AudioWorkletProcessor {
constructor() {
super();
this.buffer = [];
this.bufferSize = 0;
}
process(inputs) {
const input = inputs[0][0];
if (!input) return true;
this.buffer.push(new Float32Array(input));
this.bufferSize += input.length;
if (this.bufferSize >= 2400) {
// ~50ms at 48kHz
const merged = new Float32Array(this.bufferSize);
let offset = 0;
for (const chunk of this.buffer) {
merged.set(chunk, offset);
offset += chunk.length;
}
this.port.postMessage(merged.buffer, [merged.buffer]);
this.buffer = [];
this.bufferSize = 0;
}
return true;
}
}
registerProcessor('pcm-processor', PcmProcessor);// In your application code
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
const audioCtx = new AudioContext();
await audioCtx.audioWorklet.addModule('/audio-worklet-processor.js');
const source = audioCtx.createMediaStreamSource(stream);
const workletNode = new AudioWorkletNode(audioCtx, 'pcm-processor');
source.connect(workletNode);
workletNode.port.onmessage = async (event) => {
const blob = new Blob([event.data], { type: 'audio/pcm' });
const transcript = await client.getTranscriptionChunk(blob, audioCtx.sampleRate);
if (transcript) {
renderTranscript(transcript);
}
};Note: Do not use
MediaRecorderfor audio capture. It produces compressed formats (WebM/Opus) that the SDK does not accept.
Step 5: Display the Transcript
getTranscriptionChunk() returns a TranscriptionChunk or null (when the provider needs more audio before emitting):
interface TranscriptionChunk {
type: 'partial' | 'final';
/** Concatenated text of all tokens in this chunk. */
text: string;
/** Per-word breakdown with speaker, language, and timing. */
tokens: TranscriptionToken[];
}
interface TranscriptionToken {
text: string;
speaker?: string;
/**
* BCP 47 code identifying the language of this token. Only populated
* when the client is constructed with `experimental.multilingual: true`
* and the active provider emits per-token language identification.
* Always undefined otherwise.
*/
language?: string;
startTime: number;
endTime: number;
}partial: In-progress text that will be revised as more audio arrives. Display in a muted style (gray, italic, etc.). Replace the previous partial when a new one arrives.final: Committed, stable text. Append to your transcript display.
Each TranscriptionToken is typically one word. By default, sessions are single-locale: the SDK transcribes assuming the speaker stays in language.primary, and language is always undefined on tokens. If you have opted into experimental.multilingual and the active provider supports per-token language identification, each token's language field tells you which language was detected. Multilingual support is experimental and provider-dependent: treat a multilingual session as probably-multilingual but possibly single-locale.
Both partial and final chunks always stream to your getTranscriptionChunk callers; partial-only callers can filter on chunk.type === 'final'.
Rendering partials without flicker
The SDK emits a fresh partial chunk several times per second. A naive implementation that re-renders the whole transcript on every chunk will visibly flicker as words shift and re-color. The approach the Journalia web app uses, distilled to its essentials:
1. Group tokens into rows by speaker change and silence. Open a new row whenever the speaker label changes, when a final token ends with sentence-closing punctuation (., !, ?), or when there's a long silence gap (the web app uses 7.5 seconds) between the previous final and the next token. Within a row, append finals and replace the trailing run of partials wholesale on each chunk.
2. Render the same DOM element for a token across partial → final. Don't unmount and remount when a token graduates from partial to final. Toggle a CSS class on the existing element instead. The web app uses text-muted-foreground for partials and text-foreground for finals, with a 200ms color transition. No opacity, no movement, no fade.
3. Use stable keys derived from token timing, not array index or text. A key like ${row.startTime}-${token.startTime} is stable across partial revisions of the same word, so React (or your framework) keeps the same node and only updates its class. Index-based keys reshuffle every time a partial gets longer.
4. Trim leading whitespace per row. Tokens carry their leading space (e.g. ' world'). Strip the first token's leading whitespace when rendering a row, or you'll get a stray space at the start of every bubble.
A minimal React example:
function TranscriptRow({ row }: { row: { speaker?: string; tokens: TranscriptionToken[] } }) {
return (
<p>
{row.tokens.map((token, i) => {
const text = i === 0 ? token.text.trimStart() : token.text;
const key = `${row.tokens[0].startTime}-${token.startTime}`;
const isFinal = /* track per token: was this token in the latest final chunk? */;
return (
<span
key={key}
className={isFinal ? 'text-foreground' : 'text-muted-foreground'}
style={{ transition: 'color 200ms' }}
>
{text}
</span>
);
})}
</p>
);
}The web app layers a per-character reveal animation and gesture-aware autoscroll on top of this. Both are pure polish and not required for stable rendering.
Step 6: Stop and Generate Notes
// Fetch available note types (can be done earlier, e.g., on page load)
const noteTypes = await client.getAllNoteTypes();
// Stop recording and generate notes
const result = await client.stopTranscription({
noteType: 'soap',
notesContext: 'Female, 45 years. History of migraines.',
});notesContext is optional free text included alongside the transcript when generating notes: chief complaint, patient background, referral reason, etc.
Omit noteType to stop transcription without generating a note (transcription-only workflow).
This call may take 10-30 seconds. The client state will be 'processing' during this time. Show a loading indicator.
Step 7: Display Results
// Render each note section
for (const section of result.sections) {
console.log(`${section.title}: ${section.content}`);
}
// Full transcript is also available
console.log(result.transcript);The sections returned depend on the note type. For example, a SOAP note type returns sections with keys like subjective, objective, assessment, and plan. Other note types will have different sections.
After displaying, clean up:
client.destroy();For a new consultation, create a fresh JournaliaClient instance.
API Reference
Constructor
new JournaliaClient(config)
Creates a new SDK client instance. Does not start a session.
interface JournaliaClientConfig {
/**
* Async function the SDK calls to obtain auth credentials.
* Called on session start and on token refresh.
* Must return { token } from your backend's token endpoint.
*/
getToken: () => Promise<{ token: string }>;
/**
* Language configuration. `primary` is the BCP 47 code for the
* consultation's main language and is used for note generation.
* `additional` is an experimental, best-effort list of secondary
* language hints; see the Multilingual section for caveats.
*/
language: {
primary: string;
additional?: string[];
};
/**
* Base URL for the Journalia partner API.
* @default 'https://api.app.journalia.no'
*/
apiUrl?: string;
}Methods
startTranscription(events?)
Authenticates, initializes a session, and opens the transcription WebSocket.
| Signature | await client.startTranscription(events?: TranscriptionEvents): Promise<void> |
| State transitions | idle -> starting -> ready |
| Throws | AUTH_FAILED, INIT_FAILED, WEBSOCKET_FAILED, SESSION_ALREADY_ACTIVE |
Optional events:
interface TranscriptionEvents {
onReconnecting?: () => void;
onReconnected?: (info: { replayedChunks: number; gapDurationMs: number }) => void;
onReconnectFailed?: (error: Error) => void;
}getTranscriptionChunk(chunk, sampleRate?)
Sends an audio chunk and returns the next transcript event, or null if the provider needs more audio.
| Signature | await client.getTranscriptionChunk(chunk: Blob, sampleRate?: number): Promise<TranscriptionChunk | null> |
| Parameters | chunk: Blob containing raw PCM audio, optional sampleRate for resampling |
| State transition | ready -> recording (on first call) |
| Throws | NO_ACTIVE_SESSION |
The relationship between audio chunks and transcript events is not 1:1. The provider may need several audio chunks before emitting a transcript event. Receiving null is normal.
getAllNoteTypes()
Returns available note types for your partner account. Use this to populate a picker in your UI.
| Signature | await client.getAllNoteTypes(): Promise<NoteType[]> |
Example response:
[
{
"id": "soap",
"name": "SOAP Note",
"category": "consultation",
"languages": ["nb"],
"professionalDomains": null
},
{
"id": "free-text",
"name": "Free-text Summary",
"category": "consultation",
"languages": ["nb"],
"professionalDomains": null
}
]stopTranscription(options?)
Stops recording and optionally generates a note from the transcript.
| Signature | await client.stopTranscription(options?: StopOptions): Promise<ConsultationResult> |
| State transitions | recording -> processing -> completed |
| Duration | 10-30 seconds depending on transcript length |
| Throws | NO_ACTIVE_SESSION, INVALID_NOTE_TYPE, GENERATION_FAILED, GENERATION_TIMEOUT, AUTH_EXPIRED |
Parameters:
| Field | Type | Required | Description |
|---|---|---|---|
noteType |
string |
No | An id from getAllNoteTypes(). Omit to skip note generation. |
notesContext |
string |
No | Free-text clinician context (chief complaint, background, referral reason, etc.) |
getState()
Returns the current session state.
| Signature | client.getState(): SessionState |
State machine:
idle -> starting -> ready -> recording -> processing -> completed
\ \-> error
\-> error| State | Meaning |
|---|---|
idle |
No active session |
starting |
Auth + init + WebSocket connection in progress |
ready |
Session active, WebSocket open, ready for audio |
recording |
Audio chunks are being sent |
processing |
stopTranscription() called, waiting for notes |
completed |
Notes received |
error |
Unrecoverable error |
destroy()
Tears down the client. Closes the WebSocket, discards state. The instance cannot be reused. Create a new JournaliaClient for the next consultation.
| Signature | client.destroy(): void |
Types
TokenResponse
interface TokenResponse {
/** The session JWT. */
token: string;
}TranscriptionChunk
interface TranscriptionChunk {
/** 'partial' chunks are in-progress and will be revised.
* 'final' chunks are committed and stable. */
type: 'partial' | 'final';
/** Concatenated text of all tokens in this chunk. */
text: string;
/** Per-word breakdown with speaker, language, and timing. */
tokens: TranscriptionToken[];
}
interface TranscriptionToken {
/** The token text (typically one word, with leading whitespace where appropriate). */
text: string;
/** Speaker label (`'1'`, `'2'`, …) when diarization is enabled. */
speaker?: string;
/** Identified language as a BCP 47 code (set when multilingual hints are provided). */
language?: string;
/** Token timestamps, in seconds, relative to session start. */
startTime: number;
endTime: number;
}NoteType
interface NoteType {
/** Identifier passed to stopTranscription(). */
id: string;
/** Human-readable name. Suitable for display in a picker. */
name: string;
/** Note type category (e.g. "consultation"). */
category: string;
/** Supported language codes, or null for all languages. */
languages: string[] | null;
/** Target professional domains, or null for all. */
professionalDomains: string[] | null;
}NoteSection
interface NoteSection {
/** Machine identifier (e.g. 'subjective', 'objective'). */
key: string;
/** Display name (e.g. 'Subjective'). */
title: string;
/** The generated text for this section. */
content: string;
}ConsultationResult
interface ConsultationResult {
/** The accumulated transcript from the session. */
transcript: string;
/** Generated note sections. Empty array if no note was requested. */
sections: NoteSection[];
}StopOptions
interface StopOptions {
/** The note type to generate. Must be an id from getAllNoteTypes().
* Omit to stop transcription without generating a note. */
noteType?: string;
/** Free-text clinician context included alongside the transcript. */
notesContext?: string;
}JournaliaClientConfig
interface JournaliaClientConfig {
/**
* Async function that returns auth credentials from your backend.
* Must return { token }.
*/
getToken: () => Promise<{ token: string }>;
/**
* Language configuration. `primary` is the BCP 47 code for the
* consultation's main language and is used for note generation.
* `additional` is a list of secondary language hints used only when
* `experimental.multilingual` is enabled; otherwise it is silently
* ignored.
*/
language: {
primary: string;
additional?: string[];
};
/**
* Experimental opt-ins. Features here are not part of the stable
* SDK contract and may degrade or disappear during provider failover.
*/
experimental?: {
/**
* When true, the session attempts multilingual recognition and
* populates `TranscriptionToken.language` from the provider's
* per-token language identification. Default: false (single-locale).
*/
multilingual?: boolean;
};
/**
* Base URL for the Journalia partner API.
* @default 'https://api.app.journalia.no'
*/
apiUrl?: string;
}TranscriptionEvents
interface TranscriptionEvents {
/** Called when the WebSocket drops and reconnection begins. */
onReconnecting?: () => void;
/** Called after successful reconnection. */
onReconnected?: (info: { replayedChunks: number; gapDurationMs: number }) => void;
/** Called when reconnection fails after max attempts. */
onReconnectFailed?: (error: Error) => void;
}SessionState
type SessionState =
| 'idle'
| 'starting'
| 'ready'
| 'recording'
| 'processing'
| 'completed'
| 'error';Error Codes
| Code | Severity | Description |
|---|---|---|
AUTH_FAILED |
Fatal | Invalid credentials from getToken() |
AUTH_EXPIRED |
Recoverable | Token expired. SDK re-calls getToken() automatically. |
INIT_FAILED |
Fatal | Session initialization rejected |
WEBSOCKET_FAILED |
Fatal | Could not establish transcription connection |
WEBSOCKET_DISCONNECTED |
Recoverable | Connection dropped. SDK reconnects automatically. |
NO_ACTIVE_SESSION |
Usage | Method called outside an active session |
SESSION_ALREADY_ACTIVE |
Usage | startTranscription() called while a session is active |
INVALID_NOTE_TYPE |
Usage | Unrecognized noteType |
GENERATION_FAILED |
Fatal | Server-side note generation error |
GENERATION_TIMEOUT |
Fatal | Timed out waiting for notes (default: 120s) |
INVALID_AUDIO |
Recoverable | Audio chunk could not be processed |
Severity levels:
- Fatal: The operation failed. Your application should handle the error and inform the user.
- Recoverable: The SDK handles this internally (token refresh, WebSocket reconnection). No action needed from your code.
- Usage: Programming error. Check that methods are called in the correct order.
WebSocket reconnection is transparent. The SDK replays buffered audio so no transcript is lost. Your application continues sending audio normally during a reconnection.
Catching Errors
All SDK errors are instances of JournaliaError with a typed code property:
import { JournaliaClient, JournaliaError, ErrorCode } from '@journalia/sdk';
try {
await client.startTranscription();
} catch (err) {
if (err instanceof JournaliaError) {
switch (err.code) {
case ErrorCode.AUTH_FAILED:
console.error('Invalid credentials:', err.message);
break;
case ErrorCode.WEBSOCKET_FAILED:
console.error('Connection failed:', err.message);
break;
default:
console.error(`[${err.code}]`, err.message);
}
}
}Auth Contract
Your backend calls this to obtain a session token:
POST https://api.journalia.no/auth
Content-Type: application/json
{
"clientId": "partner_abc",
"clientSecret": "sk_live_...",
"scopeKey": "ctx_clinic_oslo_01"
}
-> 200 OK
{
"token": "eyJhbG...",
"expiresIn": 3600
}| Field | Type | Description |
|---|---|---|
clientId |
string |
Partner identifier |
clientSecret |
string |
Partner secret. Never expose to the browser. |
scopeKey |
string |
Your session scoping key. See Session Scoping. |
During development, use the sandbox credentials provided during onboarding. The sandbox environment uses the same API and auth flow as production.
Audio Format
| Format | Support | Notes |
|---|---|---|
| PCM Float32, 48 kHz | Recommended | Native AudioWorklet output. SDK resamples internally. |
| PCM 16-bit LE, 16 kHz | Recommended | Lowest latency, no resampling needed. |
| PCM 16-bit LE, 48 kHz | Supported | SDK resamples to 16 kHz. |
| WebM/Opus | Not supported | Use raw PCM capture, not MediaRecorder. |
LLM Prompts for Implementation
If your team uses LLMs to scaffold code, these prompts work well. Paste the relevant prompt along with the API Reference and Types sections from this document as context.
Backend token endpoint
Build a backend endpoint at POST /api/journalia/token. Use [your framework].
Behavior:
- Read JOURNALIA_CLIENT_ID, JOURNALIA_CLIENT_SECRET, and JOURNALIA_SCOPE_KEY from environment variables
- Call POST https://api.journalia.no/auth with a JSON body:
{ clientId, clientSecret, scopeKey }- On success, the upstream response is
{ token: string, expiresIn: number }. Forward it to the caller as-is.- On upstream error, forward the status code and body.
- On network failure, return a 502 with a JSON error.
Constraints:
- The clientSecret must never be exposed to the browser. This endpoint is the only place it should appear.
- No authentication on this endpoint itself is needed for now (it will sit behind your existing auth middleware in production).
- Keep it lean and simple.
Frontend SDK integration
Build a frontend integration using the @journalia/sdk package. Use [your framework].
The SDK client is constructed with:
- A
getTokenasync function that fetches a{ token }object from our backend (POST /api/journalia/token). This is our own endpoint, not a Journalia endpoint.- A
languageobject withprimary(BCP 47 code for the consultation's main language, used for note generation). Sessions are single-locale by default. For experimental, best-effort multilingual recognition, setexperimental.multilingual: trueon the client and optionally passlanguage.additionalas extra candidate hints, e.g.{ primary: 'nb-NO', additional: ['en'] }.The integration needs these UI elements:
- A "Start" button that calls
client.startTranscription()and begins audio capture- A "Stop" button that calls
client.stopTranscription({ noteType, notesContext })and stops audio capture- A state indicator showing the current
SessionStatefromclient.getState(). States are: idle, starting, ready, recording, processing, completed, error. Poll getState() on a short interval or after each SDK call.- A live transcript panel.
getTranscriptionChunk()returns{ type: 'partial' | 'final', text, tokens }or null. Eachtokencarriestext,startTime,endTime, and optionalspeaker. (token.languageis only present when the client is constructed withexperimental.multilingual: trueand the active provider emits per-token language identification, so treat it as optional and provider-dependent.) Partial chunks are in-progress and get replaced. Final chunks are stable and get appended. Style partials differently (gray or italic).- A note type dropdown populated from
await client.getAllNoteTypes()which returns{ id, name, category, languages, professionalDomains }[]. Call this on mount or on session start.- A text field for
notesContext(optional clinician context like chief complaint or patient background)- A loading indicator while state is
'processing'(note generation takes 10-30 seconds)- A results view that iterates over
result.sections(each withkey,title,content), plusresult.transcriptin a collapsible section- A "New consultation" action that calls
client.destroy()and creates a fresh client instance- Error handling: catch errors from SDK methods, display the error code and message to the user
Here is the full SDK API: [paste API Reference and Types sections]
Audio capture module
Build a reusable audio capture module for the browser that produces raw PCM audio chunks.
Requirements:
- Use
navigator.mediaDevices.getUserMedia({ audio: true })for microphone access- Use
AudioWorkletfor capture (NOTMediaRecorder, which produces compressed WebM/Opus that our SDK cannot accept)- In the AudioWorklet processor: accumulate Float32 samples into a buffer, and when the buffer reaches
50ms of audio (2400 samples at 48kHz), merge the buffer into a single Float32Array and post it to the main thread viathis.port.postMessage(merged.buffer, [merged.buffer])(transferable)- On the main thread: wrap each received ArrayBuffer in a
new Blob([event.data], { type: 'audio/pcm' })and pass it to a provided callbackThe module should expose:
start(): requests mic permission, creates AudioContext, registers the worklet, connects the pipeline, starts calling the callback with audio blobsstop(): disconnects the worklet, stops all media tracks, closes the AudioContext- A way to check the current state (idle, capturing, error)
- Graceful handling of permission denial (don't throw, surface the error)
The consuming code looks like this:
const chunk: TranscriptionChunk | null = await client.getTranscriptionChunk(blob);Here is the audio format table: [paste Audio Format section]