Package Exports

@summoned/gateway
@summoned/gateway/dist/cli.mjs

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (@summoned/gateway) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

Summoned AI Gateway

The open-source AI gateway built for India — and the world.

Route to 28+ LLM providers through one OpenAI-compatible API.

Quickstart · Providers · SDK · Features · Console · Contributing

A lightweight, open-source AI gateway that sits between your app and every LLM provider. Bring your own provider API keys — Summoned adds intelligent routing, automatic failover, response caching, cost governance, guardrails, and a full self-hosted console on top.

No code changes in your app. Drop-in replacement for the OpenAI API.

Zero infra to start. Just set ADMIN_API_KEY + one provider key and run.

28 providers, one API — OpenAI, Anthropic, Google, AWS Bedrock, Azure, Groq, Mistral, DeepSeek, Together, Fireworks, Cohere, Cerebras, Perplexity, xAI, OpenRouter, HuggingFace, DeepInfra, Hyperbolic, SambaNova, Novita, Moonshot, Z.AI, Nvidia NIM, Ollama, vLLM, Voyage + India-first Sarvam and Yotta.
Zero infra required — runs completely stateless. Add Redis for caching + rate limits. Add Postgres for audit history.
India-native — Sarvam AI, Yotta Labs, AWS Bedrock ap-south-1 (DPDP-compliant defaults), INR cost tracking.
Full console, self-hosted, free — dashboard, live logs, playground, cost analytics. No cloud subscription needed.
Guardrails in the free tier — PII blocking, content filters, regex rules. Competitors lock this behind enterprise.
Virtual keys — store provider credentials encrypted (AES-256-GCM). Callers reference a vk_... id; raw keys never leave the server.
Daily token budgets — hard caps per API key. Critical for agents. Free.
Official TypeScript SDK — @summoned/ai on npm.

What can you do?

Route to 28 providers through one endpoint — Supported Providers
Zero downtime when a provider goes down — Automatic Failover & Circuit Breakers
Route to the cheapest or fastest model automatically — Intelligent Routing
Stop runaway agent loops before they drain your budget — Daily Token Budgets
Cache repeated queries — Response Caching
Block PII, profanity, injection attempts — Guardrails
See cost (USD + INR), latency, and token usage in real time — Observability
Encrypt and store provider keys on the gateway — Virtual Keys
Works with OpenAI SDK, LangChain, LlamaIndex, CrewAI, Vercel AI SDK — Framework Support
Add any OpenAI-compatible provider in 5 lines — Custom Providers

[!TIP] Starring this repo helps more developers discover the gateway 🙏

⭐ Star us on GitHub — it takes 2 seconds and means a lot.

Console

Console demo

The gateway ships with a built-in web console at /console — no separate app, no extra services.

Page	What you get
Dashboard	Requests, success rate, latency percentiles (p50/p95/p99), token volume, cost in USD + INR
Live Logs	Real-time WebSocket stream of every request — filter by status or provider, click to expand
API Keys	Create, list, and revoke `sk-smnd-...` keys from the browser
Virtual Keys	Store provider credentials encrypted (AES-256-GCM) — callers use a `vk_...` ID
Providers	Health status, circuit breaker state, avg latency per provider
Playground	Send test completions through managed / virtual / BYOK auth modes — see cost, latency, cache status live

Access control: the console and its API (/console/api/*) are protected by ADMIN_API_KEY. On first visit you'll be prompted for the key; it's stored in localStorage and sent on every request. Clicking "Sign out" clears it.

Quickstart — pick one, gateway up in 15 seconds

🚀 npx (zero install)

ADMIN_API_KEY=$(openssl rand -hex 32) \
OPENAI_API_KEY=sk-... \
npx @summoned/gateway

Gateway → http://localhost:4000 · Console → http://localhost:4000/console

No clone, no Docker, no Bun — just Node 18+.

🐳 Docker (for production)

docker run -p 4000:4000 \
  -e ADMIN_API_KEY=$(openssl rand -hex 32) \
  -e OPENAI_API_KEY=sk-... \
  ghcr.io/summoned-tech/summoned-ai-gateway:latest

🛠 Full stack with Postgres + Redis (persistent logs + managed keys)

git clone https://github.com/summoned-tech/summoned-ai-gateway.git
cd summoned-ai-gateway

cp .env.example .env
# Edit .env — set ADMIN_API_KEY + POSTGRES_URL + REDIS_URL + provider keys

docker compose up -d

Or for local dev with hot reload:

make setup   # deps + Postgres + Redis + migrations + console build
make dev     # gateway with hot reload via Bun

What works without Postgres / Redis:

Feature	No Postgres	No Redis	Both absent
Chat completions	✅	✅	✅
Streaming	✅	✅	✅
Guardrails	✅	✅	✅
Fallback / circuit breaker	✅	✅	✅
Response caching	✅	in-memory	in-memory
Rate limiting	✅	in-memory	in-memory
Managed API keys	❌	✅	❌
Request history / analytics console	❌	✅	❌
Virtual key encryption	❌	✅	❌

2. Make your first request

Option A — pass your provider key directly (no gateway key needed):

curl http://localhost:4200/v1/chat/completions \
  -H "x-provider-key: sk-YOUR_OPENAI_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "openai/gpt-4o-mini", "messages": [{"role":"user","content":"Hello!"}]}'

Option B — use a gateway-managed key (recommended for teams):

# Create a key
curl -X POST http://localhost:4200/v1/keys \
  -H "x-admin-key: YOUR_ADMIN_KEY" \
  -H "Content-Type: application/json" \
  -d '{"name": "my-app", "tenantId": "team-a"}'

# Use it
curl http://localhost:4200/v1/chat/completions \
  -H "Authorization: Bearer sk-smnd-YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "openai/gpt-4o-mini", "messages": [{"role":"user","content":"Hello!"}]}'

3. Add gateway features

Control retries, fallbacks, caching, routing, and guardrails per request via the x-summoned-config header (or the SDK's config field):

import json, base64
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:4200/v1",
    api_key="sk-smnd-...",
)

config = {
    "retry":    { "attempts": 3, "backoff": "exponential" },
    "fallback": ["anthropic/claude-haiku-4", "groq/llama-3.3-70b-versatile"],
    "cache":    True,
    "routing":  "cost",   # cheapest provider first
    "guardrails": {
        "input": [{ "type": "pii", "deny": True }]
    }
}

response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Summarize this contract"}],
    extra_headers={
        "x-summoned-config": base64.b64encode(json.dumps(config).encode()).decode()
    }
)

Works with any OpenAI-compatible library — LangChain, LlamaIndex, CrewAI, Autogen, Vercel AI SDK and more.

Use from your code

The official TypeScript SDK ships on npm. It's a thin typed wrapper around the OpenAI-compatible surface, plus first-class support for the config object (retry / fallback / cache / guardrails / virtual keys).

npm install @summoned/ai

import { Summoned } from "@summoned/ai"

const client = new Summoned({
  apiKey: "sk-smnd-...",
  baseURL: "http://localhost:4200",
})

const res = await client.chat.completions.create({
  model: "openai/gpt-4o",
  messages: [{ role: "user", content: "Hello" }],
  config: {
    cache: true,
    fallback: ["anthropic/claude-sonnet-4-20250514", "groq/llama-3.3-70b-versatile"],
    routing: "cost",
  },
})

console.log(res.choices[0].message.content)
console.log(res.summoned) // { provider, cost, latency_ms, ... }

Streaming:

for await (const chunk of await client.chat.completions.create({
  model: "openai/gpt-4o",
  messages: [{ role: "user", content: "Write a haiku" }],
  stream: true,
})) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "")
}

Prefer the OpenAI SDK? Point it at the gateway:

import OpenAI from "openai"
const openai = new OpenAI({
  baseURL: "http://localhost:4200/v1",
  apiKey: "sk-smnd-...",
})

Package: @summoned/ai · ESM-only · Node 18+ · ~13 KB tarball.

Supported Providers

28 providers out of the box. Every provider listed here is enabled by setting its env var — unset ones are simply skipped at startup, so you only pay for the surface you use.

Provider	Model format	Example	Env var
OpenAI	`openai/<model>`	`openai/gpt-4o`	`OPENAI_API_KEY`
Anthropic	`anthropic/<model>`	`anthropic/claude-sonnet-4-20250514`	`ANTHROPIC_API_KEY`
Google Gemini	`google/<model>`	`google/gemini-2.0-flash`	`GOOGLE_API_KEY`
AWS Bedrock	`bedrock/<model>`	`bedrock/amazon.nova-pro-v1:0`	AWS creds / `AWS_BEDROCK_API_KEY`
Azure OpenAI	`azure/<deployment>`	`azure/gpt-4o`	`AZURE_OPENAI_API_KEY` + `AZURE_OPENAI_ENDPOINT`
Groq	`groq/<model>`	`groq/llama-3.3-70b-versatile`	`GROQ_API_KEY`
Mistral AI	`mistral/<model>`	`mistral/mistral-large-latest`	`MISTRAL_API_KEY`
Together AI	`together/<model>`	`together/meta-llama/Llama-3.3-70B-Instruct-Turbo`	`TOGETHER_API_KEY`
DeepSeek	`deepseek/<model>`	`deepseek/deepseek-chat`	`DEEPSEEK_API_KEY`
Fireworks AI	`fireworks/<model>`	`fireworks/accounts/fireworks/models/llama-v3p1-70b-instruct`	`FIREWORKS_API_KEY`
Cohere	`cohere/<model>`	`cohere/command-r-plus`	`COHERE_API_KEY`
Cerebras	`cerebras/<model>`	`cerebras/llama3.1-70b`	`CEREBRAS_API_KEY`
Perplexity	`perplexity/<model>`	`perplexity/llama-3.1-sonar-large-128k-online`	`PERPLEXITY_API_KEY`
xAI (Grok)	`xai/<model>`	`xai/grok-3`	`XAI_API_KEY`
OpenRouter	`openrouter/<upstream>/<model>`	`openrouter/openai/gpt-4o`	`OPENROUTER_API_KEY`
HuggingFace	`huggingface/<model>`	`huggingface/meta-llama/Llama-3.3-70B-Instruct`	`HUGGINGFACE_API_KEY`
DeepInfra	`deepinfra/<model>`	`deepinfra/meta-llama/Meta-Llama-3.1-70B-Instruct`	`DEEPINFRA_API_KEY`
Hyperbolic	`hyperbolic/<model>`	`hyperbolic/deepseek-ai/DeepSeek-V3`	`HYPERBOLIC_API_KEY`
SambaNova	`sambanova/<model>`	`sambanova/Meta-Llama-3.1-405B-Instruct`	`SAMBANOVA_API_KEY`
Novita AI	`novita/<model>`	`novita/meta-llama/llama-3.1-70b-instruct`	`NOVITA_API_KEY`
Moonshot (Kimi)	`moonshot/<model>`	`moonshot/moonshot-v1-128k`	`MOONSHOT_API_KEY`
Z.AI (GLM)	`zai/<model>`	`zai/glm-4.5`	`ZAI_API_KEY`
Nvidia NIM	`nvidia/<model>`	`nvidia/meta/llama-3.1-405b-instruct`	`NVIDIA_API_KEY`
Ollama	`ollama/<model>`	`ollama/llama3.2`	`OLLAMA_BASE_URL` (local, no key)
vLLM	`vllm/<model>`	`vllm/meta-llama/Llama-3-70B-Instruct`	`VLLM_BASE_URL` (+ optional `VLLM_API_KEY`)
Voyage AI	`voyage/<model>`	`voyage/voyage-3-large`	`VOYAGE_API_KEY` (embeddings / rerank)
Sarvam AI 🇮🇳	`sarvam/<model>`	`sarvam/sarvam-2b-v0.5`	`SARVAM_API_KEY`
Yotta Labs 🇮🇳	`yotta/<model>`	`yotta/yotta-mini`	`YOTTA_API_KEY`

Pure proxy — no static model catalog. Any model the upstream provider accepts works immediately, zero config changes when new models launch.

Any OpenAI-compatible provider — use CUSTOM_PROVIDERS to add any private endpoint in JSON config. No code changes needed.

Works with any framework

// @summoned/ai (official)
import { Summoned } from "@summoned/ai"
const client = new Summoned({ apiKey: "sk-smnd-...", baseURL: "http://localhost:4200" })

// OpenAI SDK
import OpenAI from "openai"
const openai = new OpenAI({ baseURL: "http://localhost:4200/v1", apiKey: "sk-smnd-..." })

// Vercel AI SDK
import { createOpenAI } from "@ai-sdk/openai"
const openai = createOpenAI({ baseURL: "http://localhost:4200/v1", apiKey: "sk-smnd-..." })

# LangChain
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(base_url="http://localhost:4200/v1", api_key="sk-smnd-...")

# LlamaIndex
from llama_index.llms.openai import OpenAI
llm = OpenAI(base_url="http://localhost:4200/v1", api_key="sk-smnd-...")

# CrewAI, Autogen — just set OPENAI_BASE_URL=http://localhost:4200/v1

Core Features

Reliability

Feature	How it works
Automatic retries	Exponential or linear backoff. Configurable attempts per request.
Fallback models	Specify alternate `provider/model` slugs. Gateway tries them in order on failure.
Circuit breaker	Per-provider. Opens after 5 consecutive failures, retries after 30s in HALF_OPEN state.
Request timeouts	Per-request timeout with automatic cancellation and graceful SSE termination.

Intelligent Routing

Strategy	How it works
`"routing": "cost"`	Sorts model chain by input token price — cheapest first.
`"routing": "latency"`	Sorts by observed Exponential Moving Average latency per provider (stored in Redis).
`"routing": "default"`	Uses the order you specified in `fallback_models`.

Cost Governance

Feature	How it works
Daily token budget (TPD)	Hard cap on `inputTokens + outputTokens` per API key per day. Enforced atomically in Redis. Returns `429 BUDGET_EXCEEDED` when exceeded. Auto-resets at midnight.
Per-key rate limiting	Requests per minute (RPM) sliding window per `sk-smnd-...` key. IP-based for BYOK callers.
Cost tracking	Per-request cost in USD and INR. In response headers, live logs, and dashboard. Unknown-model costs are flagged `priceUnknown: true` rather than silently reported as $0.

Security

Feature	How it works
API key auth	SHA-256 hashed `sk-smnd-...` keys. Redis-cached for fast lookups.
Virtual key encryption	Provider credentials stored with AES-256-GCM via HKDF. Callers reference `vk_...` ID. Cache invalidated immediately on revoke.
Guardrails	Block PII (email, phone, SSN, Aadhaar, credit card), blocked words, regex, length — on input and output.
Timing-safe auth	Admin + API key comparison is constant-time. Timing attack resistant.
Body size limit	Requests over 4 MB rejected with `413`.
Security headers	`X-Content-Type-Options`, `X-Frame-Options`, `Referrer-Policy`, HSTS (in production).
Admin brute-force protection	20 req/min per IP on admin + console endpoints.
Console lockdown	`/console/api/*` requires `x-admin-key` on every request; CORS scoped to `/v1` and `/health` only.
BYOK mode	Pass provider key via `x-provider-key` header. No gateway key required. IP-rate-limited.

Performance

Feature	How it works
Response caching	Redis-backed (in-memory fallback). Cache key = SHA-256 of (model + messages + params). Identical requests served instantly.
Full streaming	SSE streaming across every provider, with fallback-before-first-chunk and clean `[DONE]` termination on error.

Observability

Feature	How it works
Live log stream	WebSocket stream. Every request logged with provider, model, latency, cost, status.
Prometheus metrics	`/metrics` endpoint (admin-protected). Scrape with Grafana, Datadog, Prometheus.
OpenTelemetry	Distributed traces exported to any OTLP backend (Jaeger, Grafana Tempo, Honeycomb).
Response headers	`X-Summoned-Provider`, `X-Summoned-Cost-USD`, `X-Summoned-Latency-Ms`, `X-Summoned-Cache`, `X-Daily-Remaining` on every response.

API Reference

Endpoint	Method	Auth	Description
`/v1/chat/completions`	POST	`Bearer` or `x-provider-key`	OpenAI-compatible completion (streaming + tools)
`/v1/embeddings`	POST	`Bearer`	Text embeddings
`/v1/models`	GET	—	List registered providers
`/v1/keys`	POST / GET / DELETE	`x-admin-key`	API key management
`/admin/virtual-keys`	POST / GET / DELETE	`x-admin-key`	Virtual key management
`/admin/logs`	GET	`x-admin-key`	Request logs (buffer or DB)
`/admin/stats`	GET	`x-admin-key`	Aggregated statistics
`/admin/providers`	GET	`x-admin-key`	Provider health + circuit breaker state
`/console/api/*`	*	`x-admin-key`	Admin API used by the web console (same surface as `/admin`)
`/metrics`	GET	`x-admin-key`	Prometheus metrics
`/ws/logs`	WebSocket	`?key=ADMIN_KEY`	Real-time log streaming
`/health`	GET	—	Liveness check
`/health/ready`	GET	—	Readiness (Postgres + Redis)
`/console`	GET	`ADMIN_API_KEY` (in browser prompt)	Built-in web console

Configuration

See .env.example for the full reference.

Core

Variable	Required	Default	Description
`ADMIN_API_KEY`	Yes	—	Master admin key (min 32 chars). `openssl rand -hex 32`
`VIRTUAL_KEY_SECRET`	Recommended	Falls back to admin key	Encryption key for virtual keys. `openssl rand -hex 32`
`GATEWAY_PORT`	No	`4000`	Port to listen on (`.env.example` sets `4200`)
`GATEWAY_REQUIRE_AUTH`	No	`true`	Set `false` for trusted private networks
`PUBLIC_RPM_LIMIT`	No	`60`	RPM cap for BYOK / unauthenticated callers
`POSTGRES_URL`	Optional	—	Enables managed keys, virtual keys, audit history
`REDIS_URL`	Optional	—	Enables cache, rate limits, latency EMA
`OTEL_EXPORTER_OTLP_ENDPOINT`	No	—	OpenTelemetry trace endpoint
`USD_INR_RATE`	No	`85`	Exchange rate for INR cost display

Provider credentials

Set only the ones you plan to use — unset providers are skipped at startup.

Variable	Provider
`OPENAI_API_KEY`	OpenAI
`ANTHROPIC_API_KEY`	Anthropic
`GOOGLE_API_KEY`	Google Gemini
`GROQ_API_KEY`	Groq
`AZURE_OPENAI_API_KEY` + `AZURE_OPENAI_ENDPOINT`	Azure OpenAI
`AWS_ACCESS_KEY_ID` / `AWS_BEDROCK_API_KEY` + `AWS_REGION`	AWS Bedrock
`MISTRAL_API_KEY`	Mistral AI
`TOGETHER_API_KEY`	Together AI
`DEEPSEEK_API_KEY`	DeepSeek
`FIREWORKS_API_KEY`	Fireworks AI
`COHERE_API_KEY`	Cohere
`CEREBRAS_API_KEY`	Cerebras
`PERPLEXITY_API_KEY`	Perplexity
`XAI_API_KEY`	xAI / Grok
`OPENROUTER_API_KEY`	OpenRouter
`HUGGINGFACE_API_KEY`	HuggingFace
`DEEPINFRA_API_KEY`	DeepInfra
`HYPERBOLIC_API_KEY`	Hyperbolic
`SAMBANOVA_API_KEY`	SambaNova
`NOVITA_API_KEY`	Novita AI
`MOONSHOT_API_KEY`	Moonshot (Kimi)
`ZAI_API_KEY`	Z.AI (Zhipu / GLM)
`NVIDIA_API_KEY`	Nvidia NIM
`OLLAMA_BASE_URL`	Ollama (local)
`VLLM_BASE_URL` (+ optional `VLLM_API_KEY`)	vLLM (self-hosted)
`VOYAGE_API_KEY`	Voyage AI
`SARVAM_API_KEY` 🇮🇳	Sarvam AI
`YOTTA_API_KEY` 🇮🇳	Yotta Labs
`CUSTOM_PROVIDERS`	JSON array `[{id,name,baseUrl,apiKey}]` for any OpenAI-compatible endpoint

Adding a New Provider

Takes ~10 lines of code and ~5 minutes if the provider speaks the OpenAI API format:

// src/providers/your-provider.ts
import { createOpenAICompatProvider } from "./openai-compat"

export function createYourProvider(apiKey: string) {
  return createOpenAICompatProvider({
    id: "yourprovider",
    name: "Your Provider",
    apiKey,
    baseURL: "https://api.yourprovider.com/v1",
  })
}

Then add the env var in src/lib/env.ts, register it in src/index.ts, and optionally add pricing in src/lib/models/your-provider.ts. See CONTRIBUTING.md for a full walkthrough.

Development

make setup          # Full setup: deps + Postgres + Redis + migrations + console
make dev            # Gateway with hot reload
make dev-console    # Console Vite dev server
make check-types    # TypeScript type check
make migrate        # Run DB migrations
make create-key     # Quick-create an API key for testing
make help           # All commands

Run the test suite:

bun test tests      # ~45 tests across pricing, guardrails, fallback, config, circuit breaker

SDKs

Language	Package	Source
TypeScript / JavaScript	`@summoned/ai`	`summoned-sdk-ts`
Python	`summoned-ai` (PyPI)	`summoned-sdk-python`

Contributing

The easiest way to contribute is to add a new LLM provider — it's ~10 lines of code and ~5 minutes. See CONTRIBUTING.md.

Bug report? Open an issue → Feature request? Start a discussion →

License

MIT — free to use, fork, modify, and self-host.

Built by Summoned Tech
_{Made with ♥ for developers who care about production AI infra}