Package Exports

@nfinitmonkeys/cortex-sdk

Readme

@cortex/sdk — TypeScript

The official TypeScript / JavaScript client for Cortex — InfiniteMonkeys' secure LLM gateway. Chat, vision, embeddings, speech, RAG, research — fully typed, streaming-ready, works in Node 18+ and modern browsers.

npm install @cortex/sdk

import { Cortex } from "@cortex/sdk";

const cortex = new Cortex({ apiKey: "sk-cortex-..." });
const r = await cortex.chat("Hello, world!");
console.log(r.text);

That's it. Read on for every feature.

Setup
Chat
Streaming chat
Response style presets
JSON / structured output
Embeddings
Speech-to-text
Text-to-speech
Document extraction (Iris)
OCR + form templates
Deep Research
RAG collections
Error handling
Checking service health
Configuration

Setup

Install from npm:

npm install @nfinitmonkeys/cortex-sdk

import { Cortex } from "@nfinitmonkeys/cortex-sdk";

// Picks up CORTEX_API_KEY from the environment (Node)
const cortex = new Cortex();

// Or pass it:
const cortex2 = new Cortex({ apiKey: "sk-cortex-..." });

Works in Node 18+, Bun, Deno, and modern browsers (Edge, Chrome, Safari). In the browser you'll need a server-side proxy unless your key is a public one — never ship a user-scoped key to the client.

Migrating from v1.x

v2 rewrote the client around a flatter, friendlier API. Your old import keeps working — CortexClient is now an alias for Cortex — but the method shape changed. One-time rewrite, no polyfills.

v1 (resource-group style)	v2 (flat style)
`client.chat.completions.create({ model: "default", messages: [...] })`	`cortex.chat("Hi")`
`.create({ ..., stream: true })` → iterate raw SSE chunks	`for await (const c of cortex.chatStream("Hi")) { ... }` — just the content
`client.embeddings.create({ input, model })`	`await cortex.embed(input)`
`client.audio.transcriptions.create({ file })`	`await cortex.transcribe(audio)`
`client.audio.speech.create({ input, voice })`	`await cortex.speak("text", { voice: "james" })`
`client.iris.extract({ file })`	`await cortex.extract(pdf)`

New in v2:

Style presets ({ style: "concise" } / "markdown" / etc.)
Typed errors — CortexAuthError, CortexRateLimitError, CortexUpstreamError, …
ChatResponse.parseJson<T>() — strips markdown fences from LLM JSON
Auto-retry on 429/5xx with Retry-After honoring
RAG Collections (cortex.collections.ask(...))
Deep Research (cortex.research.wait(...))
Static Cortex.status() — no API key needed
AbortSignal support on every method

No timeline to remove the CortexClient alias — it stays forever.

Chat

Simplest case:

const r = await cortex.chat("What is a vector database?");
console.log(r.text);

Multi-turn:

const r = await cortex.chat([
  { role: "system", content: "You are a concise assistant." },
  { role: "user", content: "Name three NoSQL databases." },
]);

Per-call pool routing:

const r = await cortex.chat(
  "Extract names from: Alice, Bob, Carol",
  { pool: "cortex-extract" }
);

Streaming chat

for await (const chunk of cortex.chatStream("Write a limerick about otters")) {
  process.stdout.write(chunk);
}

Works with AbortSignal to cancel mid-stream:

const ctrl = new AbortController();
setTimeout(() => ctrl.abort(), 5_000);

for await (const chunk of cortex.chatStream("Long story please", { signal: ctrl.signal })) {
  process.stdout.write(chunk);
}

Response style presets

Pick a style instead of writing a system prompt every time:

await cortex.chat("Summarise RAG.",      { style: "concise" });
await cortex.chat("Show a Python list",  { style: "code-only" });
await cortex.chat("Compare Redis & Mongo",{ style: "markdown" });
await cortex.chat("Deploy nginx",        { style: "technical" });
await cortex.chat("Pick a name",         { style: "chat" });

Combine with your own system prompt:

await cortex.chat("Find the bug", {
  style: "technical",
  system: "You are a staff Python engineer reviewing a PR.",
});

JSON / structured output

Schema-enforced — no retries, no regex parsing:

const r = await cortex.chat("John Doe, 42, lives in Boston.", {
  responseFormat: {
    type: "json_schema",
    json_schema: {
      name: "person",
      schema: {
        type: "object",
        properties: {
          name: { type: "string" },
          age:  { type: "integer" },
          city: { type: "string" },
        },
        required: ["name", "age", "city"],
      },
    },
  },
});

const data = JSON.parse(r.text);   // always valid

Embeddings

const v = await cortex.embed("cortex is a gateway");   // number[]  (1024)
const vs = await cortex.embed(["a", "b", "c"]);        // number[][]

Speech-to-text (transcription)

Pass a Blob, File, ArrayBuffer, or Uint8Array:

import { readFileSync } from "node:fs";
const audio = readFileSync("meeting.wav");
const t = await cortex.transcribe(audio, { filename: "meeting.wav" });
console.log(t.text);

With speaker diarization:

const t = await cortex.transcribe(audio, { filename: "meeting.wav", diarize: true });

In the browser with a <input type="file">:

const input = document.querySelector<HTMLInputElement>("#audio")!;
const file = input.files![0];
const t = await cortex.transcribe(file, { filename: file.name });

Text-to-speech

const audio = await cortex.speak("Welcome to Cortex.");   // ArrayBuffer
writeFileSync("hello.wav", new Uint8Array(audio));

Expressive (auto-inserts laughs, sighs, pauses):

const audio = await cortex.speak(
  "Wow! That's amazing. I'm so glad you came.",
  { expressive: true }
);

Voice selection:

await cortex.speak("Hello", { voice: "james" });

Document extraction (Iris)

import { readFileSync } from "node:fs";
const pdf = readFileSync("invoice.pdf");
const inv = await cortex.extract(pdf, { filename: "invoice.pdf", type: "invoice" });
console.log(inv.result);

Custom schema:

const medical = await cortex.extract(pdf, {
  filename: "discharge.pdf",
  schema: {
    patient_name: "string",
    diagnosis_codes: "string[]",
    discharge_date: "date",
  },
});

Submit a correction (training signal):

await cortex.correctExtraction(inv.id, [
  { field_name: "total", original_value: "127.43", corrected_value: "1274.30" },
]);

OCR + form templates

Raw OCR:

const ocr = await cortex.ocr(imgBuffer, { filename: "scan.png" });
console.log(ocr.text);

Structured fields via a template (~200 ms):

const fields = await cortex.ocr(claimPdf, { filename: "claim.pdf", template: "cms1500" });
console.log(fields.fields?.patient_name);

Built-in templates: cms1500, ub04, superbill, eob.

Deep Research

const job = await cortex.research.submit(
  "What do you know about Acme Medical Group?",
  { type: "company_enrichment", depth: "quick" }
);

// Block until done (polls automatically)
const result = await cortex.research.wait(job.job_id, { timeoutMs: 600_000 });
console.log(result.result);

RAG collections

await cortex.collections.create("company-kb");
await cortex.collections.upload("company-kb", handbookBytes, { filename: "handbook.pdf" });

const a = await cortex.collections.ask("company-kb", "What's our PTO policy?");
console.log(a.answer);
for (const s of a.sources) {
  console.log(`  · ${s.filename} (${Math.round(Number(s.score) * 100)}% match)`);
}

Just search:

const hits = await cortex.collections.search("company-kb", "parental leave", { topK: 3 });

Error handling

Every error extends CortexError:

import { Cortex, CortexAuthError, CortexRateLimitError, CortexError } from "@cortex/sdk";

try {
  await cortex.chat("hello");
} catch (err) {
  if (err instanceof CortexAuthError) { /* key invalid or revoked */ }
  else if (err instanceof CortexRateLimitError) { /* back off */ }
  else if (err instanceof CortexError) {
    console.error(`Cortex failed: ${err.statusCode} ${err.detail}`);
  } else {
    throw err;
  }
}

429 / 502 / 503 / 504 are retried automatically with exponential backoff (default 2 retries) — you only see them as errors when retries are exhausted.

Checking service health

No API key needed:

import { Cortex } from "@cortex/sdk";

const status = await Cortex.status();
console.log(`Cortex is ${status.overall}`);
for (const pool of status.pools) {
  console.log(`  ${pool.pool}: ${pool.status}`);
}

Or just open status.nfinitmonkeys.com in a browser.

Configuration

Option	Default	What
`apiKey`	`process.env.CORTEX_API_KEY`	Your API key
`baseUrl`	`https://cortexapi.nfinitmonkeys.com`	Override for self-hosted
`timeoutMs`	`120_000`	Default request timeout
`retries`	`2`	Retries on 429/5xx
`defaultPool`	`undefined`	Default `X-Cortex-Pool` header
`fetch`	`globalThis.fetch`	Custom fetch (proxy/testing)

const cortex = new Cortex({
  timeoutMs: 300_000,
  retries: 5,
  defaultPool: "cortex-extract",
});

Bundle size

ESM: ~19 KB (minified, unzipped)
CJS: ~21 KB
Zero runtime dependencies — relies on native fetch, FormData, Blob, AbortController

Support

Docs: this README plus JSDoc on every method
Status: status.nfinitmonkeys.com
Issues: file on GitHub

Happy building 🦧

JSPM

@nfinitmonkeys/cortex-sdk