Package Exports
- @summoned/gateway
- @summoned/gateway/dist/cli.mjs
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (@summoned/gateway) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
Summoned AI Gateway
The open-source AI gateway built for India — and the world.
Route to 28+ LLM providers through one OpenAI-compatible API.
Quickstart · Providers · SDK · Features · Console · Contributing
A lightweight, open-source AI gateway that sits between your app and every LLM provider. Bring your own provider API keys — Summoned adds intelligent routing, automatic failover, response caching, cost governance, guardrails, and a full self-hosted console on top.
No code changes in your app. Drop-in replacement for the OpenAI API.
Zero infra to start. Just set ADMIN_API_KEY + one provider key and run.
- 28 providers, one API — OpenAI, Anthropic, Google, AWS Bedrock, Azure, Groq, Mistral, DeepSeek, Together, Fireworks, Cohere, Cerebras, Perplexity, xAI, OpenRouter, HuggingFace, DeepInfra, Hyperbolic, SambaNova, Novita, Moonshot, Z.AI, Nvidia NIM, Ollama, vLLM, Voyage + India-first Sarvam and Yotta.
- Zero infra required — runs completely stateless. Add Redis for caching + rate limits. Add Postgres for audit history.
- India-native — Sarvam AI, Yotta Labs, AWS Bedrock
ap-south-1(DPDP-compliant defaults), INR cost tracking. - Full console, self-hosted, free — dashboard, live logs, playground, cost analytics. No cloud subscription needed.
- Guardrails in the free tier — PII blocking, content filters, regex rules. Competitors lock this behind enterprise.
- Virtual keys — store provider credentials encrypted (AES-256-GCM). Callers reference a
vk_...id; raw keys never leave the server. - Daily token budgets — hard caps per API key. Critical for agents. Free.
- Official TypeScript SDK —
@summoned/aion npm.
What can you do?
- Route to 28 providers through one endpoint — Supported Providers
- Zero downtime when a provider goes down — Automatic Failover & Circuit Breakers
- Route to the cheapest or fastest model automatically — Intelligent Routing
- Stop runaway agent loops before they drain your budget — Daily Token Budgets
- Cache repeated queries — Response Caching
- Block PII, profanity, injection attempts — Guardrails
- See cost (USD + INR), latency, and token usage in real time — Observability
- Encrypt and store provider keys on the gateway — Virtual Keys
- Works with OpenAI SDK, LangChain, LlamaIndex, CrewAI, Vercel AI SDK — Framework Support
- Add any OpenAI-compatible provider in 5 lines — Custom Providers
[!TIP] Starring this repo helps more developers discover the gateway 🙏
⭐ Star us on GitHub — it takes 2 seconds and means a lot.
Console

The gateway ships with a built-in web console at /console — no separate app, no extra services.
| Page | What you get |
|---|---|
| Dashboard | Requests, success rate, latency percentiles (p50/p95/p99), token volume, cost in USD + INR |
| Live Logs | Real-time WebSocket stream of every request — filter by status or provider, click to expand |
| API Keys | Create, list, and revoke sk-smnd-... keys from the browser |
| Virtual Keys | Store provider credentials encrypted (AES-256-GCM) — callers use a vk_... ID |
| Providers | Health status, circuit breaker state, avg latency per provider |
| Playground | Send test completions through managed / virtual / BYOK auth modes — see cost, latency, cache status live |
Access control: the console and its API (
/console/api/*) are protected byADMIN_API_KEY. On first visit you'll be prompted for the key; it's stored inlocalStorageand sent on every request. Clicking "Sign out" clears it.
Quickstart — pick one, gateway up in 15 seconds
🚀 npx (zero install)
ADMIN_API_KEY=$(openssl rand -hex 32) \
OPENAI_API_KEY=sk-... \
npx @summoned/gatewayGateway →
http://localhost:4000· Console →http://localhost:4000/console
No clone, no Docker, no Bun — just Node 18+.
🐳 Docker (for production)
docker run -p 4000:4000 \
-e ADMIN_API_KEY=$(openssl rand -hex 32) \
-e OPENAI_API_KEY=sk-... \
ghcr.io/summoned-tech/summoned-ai-gateway:latest🛠 Full stack with Postgres + Redis (persistent logs + managed keys)
git clone https://github.com/summoned-tech/summoned-ai-gateway.git
cd summoned-ai-gateway
cp .env.example .env
# Edit .env — set ADMIN_API_KEY + POSTGRES_URL + REDIS_URL + provider keys
docker compose up -dOr for local dev with hot reload:
make setup # deps + Postgres + Redis + migrations + console build
make dev # gateway with hot reload via BunWhat works without Postgres / Redis:
| Feature | No Postgres | No Redis | Both absent |
|---|---|---|---|
| Chat completions | ✅ | ✅ | ✅ |
| Streaming | ✅ | ✅ | ✅ |
| Guardrails | ✅ | ✅ | ✅ |
| Fallback / circuit breaker | ✅ | ✅ | ✅ |
| Response caching | ✅ | in-memory | in-memory |
| Rate limiting | ✅ | in-memory | in-memory |
| Managed API keys | ❌ | ✅ | ❌ |
| Request history / analytics console | ❌ | ✅ | ❌ |
| Virtual key encryption | ❌ | ✅ | ❌ |
2. Make your first request
Option A — pass your provider key directly (no gateway key needed):
curl http://localhost:4200/v1/chat/completions \
-H "x-provider-key: sk-YOUR_OPENAI_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "openai/gpt-4o-mini", "messages": [{"role":"user","content":"Hello!"}]}'Option B — use a gateway-managed key (recommended for teams):
# Create a key
curl -X POST http://localhost:4200/v1/keys \
-H "x-admin-key: YOUR_ADMIN_KEY" \
-H "Content-Type: application/json" \
-d '{"name": "my-app", "tenantId": "team-a"}'
# Use it
curl http://localhost:4200/v1/chat/completions \
-H "Authorization: Bearer sk-smnd-YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "openai/gpt-4o-mini", "messages": [{"role":"user","content":"Hello!"}]}'3. Add gateway features
Control retries, fallbacks, caching, routing, and guardrails per request via the x-summoned-config header (or the SDK's config field):
import json, base64
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:4200/v1",
api_key="sk-smnd-...",
)
config = {
"retry": { "attempts": 3, "backoff": "exponential" },
"fallback": ["anthropic/claude-haiku-4", "groq/llama-3.3-70b-versatile"],
"cache": True,
"routing": "cost", # cheapest provider first
"guardrails": {
"input": [{ "type": "pii", "deny": True }]
}
}
response = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Summarize this contract"}],
extra_headers={
"x-summoned-config": base64.b64encode(json.dumps(config).encode()).decode()
}
)Works with any OpenAI-compatible library — LangChain, LlamaIndex, CrewAI, Autogen, Vercel AI SDK and more.
Use from your code
The official TypeScript SDK ships on npm. It's a thin typed wrapper around the OpenAI-compatible surface, plus first-class support for the config object (retry / fallback / cache / guardrails / virtual keys).
npm install @summoned/aiimport { Summoned } from "@summoned/ai"
const client = new Summoned({
apiKey: "sk-smnd-...",
baseURL: "http://localhost:4200",
})
const res = await client.chat.completions.create({
model: "openai/gpt-4o",
messages: [{ role: "user", content: "Hello" }],
config: {
cache: true,
fallback: ["anthropic/claude-sonnet-4-20250514", "groq/llama-3.3-70b-versatile"],
routing: "cost",
},
})
console.log(res.choices[0].message.content)
console.log(res.summoned) // { provider, cost, latency_ms, ... }Streaming:
for await (const chunk of await client.chat.completions.create({
model: "openai/gpt-4o",
messages: [{ role: "user", content: "Write a haiku" }],
stream: true,
})) {
process.stdout.write(chunk.choices[0]?.delta?.content ?? "")
}Prefer the OpenAI SDK? Point it at the gateway:
import OpenAI from "openai"
const openai = new OpenAI({
baseURL: "http://localhost:4200/v1",
apiKey: "sk-smnd-...",
})Package:
@summoned/ai· ESM-only · Node 18+ · ~13 KB tarball.
Supported Providers
28 providers out of the box. Every provider listed here is enabled by setting its env var — unset ones are simply skipped at startup, so you only pay for the surface you use.
| Provider | Model format | Example | Env var |
|---|---|---|---|
| OpenAI | openai/<model> |
openai/gpt-4o |
OPENAI_API_KEY |
| Anthropic | anthropic/<model> |
anthropic/claude-sonnet-4-20250514 |
ANTHROPIC_API_KEY |
| Google Gemini | google/<model> |
google/gemini-2.0-flash |
GOOGLE_API_KEY |
| AWS Bedrock | bedrock/<model> |
bedrock/amazon.nova-pro-v1:0 |
AWS creds / AWS_BEDROCK_API_KEY |
| Azure OpenAI | azure/<deployment> |
azure/gpt-4o |
AZURE_OPENAI_API_KEY + AZURE_OPENAI_ENDPOINT |
| Groq | groq/<model> |
groq/llama-3.3-70b-versatile |
GROQ_API_KEY |
| Mistral AI | mistral/<model> |
mistral/mistral-large-latest |
MISTRAL_API_KEY |
| Together AI | together/<model> |
together/meta-llama/Llama-3.3-70B-Instruct-Turbo |
TOGETHER_API_KEY |
| DeepSeek | deepseek/<model> |
deepseek/deepseek-chat |
DEEPSEEK_API_KEY |
| Fireworks AI | fireworks/<model> |
fireworks/accounts/fireworks/models/llama-v3p1-70b-instruct |
FIREWORKS_API_KEY |
| Cohere | cohere/<model> |
cohere/command-r-plus |
COHERE_API_KEY |
| Cerebras | cerebras/<model> |
cerebras/llama3.1-70b |
CEREBRAS_API_KEY |
| Perplexity | perplexity/<model> |
perplexity/llama-3.1-sonar-large-128k-online |
PERPLEXITY_API_KEY |
| xAI (Grok) | xai/<model> |
xai/grok-3 |
XAI_API_KEY |
| OpenRouter | openrouter/<upstream>/<model> |
openrouter/openai/gpt-4o |
OPENROUTER_API_KEY |
| HuggingFace | huggingface/<model> |
huggingface/meta-llama/Llama-3.3-70B-Instruct |
HUGGINGFACE_API_KEY |
| DeepInfra | deepinfra/<model> |
deepinfra/meta-llama/Meta-Llama-3.1-70B-Instruct |
DEEPINFRA_API_KEY |
| Hyperbolic | hyperbolic/<model> |
hyperbolic/deepseek-ai/DeepSeek-V3 |
HYPERBOLIC_API_KEY |
| SambaNova | sambanova/<model> |
sambanova/Meta-Llama-3.1-405B-Instruct |
SAMBANOVA_API_KEY |
| Novita AI | novita/<model> |
novita/meta-llama/llama-3.1-70b-instruct |
NOVITA_API_KEY |
| Moonshot (Kimi) | moonshot/<model> |
moonshot/moonshot-v1-128k |
MOONSHOT_API_KEY |
| Z.AI (GLM) | zai/<model> |
zai/glm-4.5 |
ZAI_API_KEY |
| Nvidia NIM | nvidia/<model> |
nvidia/meta/llama-3.1-405b-instruct |
NVIDIA_API_KEY |
| Ollama | ollama/<model> |
ollama/llama3.2 |
OLLAMA_BASE_URL (local, no key) |
| vLLM | vllm/<model> |
vllm/meta-llama/Llama-3-70B-Instruct |
VLLM_BASE_URL (+ optional VLLM_API_KEY) |
| Voyage AI | voyage/<model> |
voyage/voyage-3-large |
VOYAGE_API_KEY (embeddings / rerank) |
| Sarvam AI 🇮🇳 | sarvam/<model> |
sarvam/sarvam-2b-v0.5 |
SARVAM_API_KEY |
| Yotta Labs 🇮🇳 | yotta/<model> |
yotta/yotta-mini |
YOTTA_API_KEY |
Pure proxy — no static model catalog. Any model the upstream provider accepts works immediately, zero config changes when new models launch.
Any OpenAI-compatible provider — use
CUSTOM_PROVIDERSto add any private endpoint in JSON config. No code changes needed.
Works with any framework
// @summoned/ai (official)
import { Summoned } from "@summoned/ai"
const client = new Summoned({ apiKey: "sk-smnd-...", baseURL: "http://localhost:4200" })
// OpenAI SDK
import OpenAI from "openai"
const openai = new OpenAI({ baseURL: "http://localhost:4200/v1", apiKey: "sk-smnd-..." })
// Vercel AI SDK
import { createOpenAI } from "@ai-sdk/openai"
const openai = createOpenAI({ baseURL: "http://localhost:4200/v1", apiKey: "sk-smnd-..." })# LangChain
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(base_url="http://localhost:4200/v1", api_key="sk-smnd-...")
# LlamaIndex
from llama_index.llms.openai import OpenAI
llm = OpenAI(base_url="http://localhost:4200/v1", api_key="sk-smnd-...")
# CrewAI, Autogen — just set OPENAI_BASE_URL=http://localhost:4200/v1Core Features
Reliability
| Feature | How it works |
|---|---|
| Automatic retries | Exponential or linear backoff. Configurable attempts per request. |
| Fallback models | Specify alternate provider/model slugs. Gateway tries them in order on failure. |
| Circuit breaker | Per-provider. Opens after 5 consecutive failures, retries after 30s in HALF_OPEN state. |
| Request timeouts | Per-request timeout with automatic cancellation and graceful SSE termination. |
Intelligent Routing
| Strategy | How it works |
|---|---|
"routing": "cost" |
Sorts model chain by input token price — cheapest first. |
"routing": "latency" |
Sorts by observed Exponential Moving Average latency per provider (stored in Redis). |
"routing": "default" |
Uses the order you specified in fallback_models. |
Cost Governance
| Feature | How it works |
|---|---|
| Daily token budget (TPD) | Hard cap on inputTokens + outputTokens per API key per day. Enforced atomically in Redis. Returns 429 BUDGET_EXCEEDED when exceeded. Auto-resets at midnight. |
| Per-key rate limiting | Requests per minute (RPM) sliding window per sk-smnd-... key. IP-based for BYOK callers. |
| Cost tracking | Per-request cost in USD and INR. In response headers, live logs, and dashboard. Unknown-model costs are flagged priceUnknown: true rather than silently reported as $0. |
Security
| Feature | How it works |
|---|---|
| API key auth | SHA-256 hashed sk-smnd-... keys. Redis-cached for fast lookups. |
| Virtual key encryption | Provider credentials stored with AES-256-GCM via HKDF. Callers reference vk_... ID. Cache invalidated immediately on revoke. |
| Guardrails | Block PII (email, phone, SSN, Aadhaar, credit card), blocked words, regex, length — on input and output. |
| Timing-safe auth | Admin + API key comparison is constant-time. Timing attack resistant. |
| Body size limit | Requests over 4 MB rejected with 413. |
| Security headers | X-Content-Type-Options, X-Frame-Options, Referrer-Policy, HSTS (in production). |
| Admin brute-force protection | 20 req/min per IP on admin + console endpoints. |
| Console lockdown | /console/api/* requires x-admin-key on every request; CORS scoped to /v1 and /health only. |
| BYOK mode | Pass provider key via x-provider-key header. No gateway key required. IP-rate-limited. |
Performance
| Feature | How it works |
|---|---|
| Response caching | Redis-backed (in-memory fallback). Cache key = SHA-256 of (model + messages + params). Identical requests served instantly. |
| Full streaming | SSE streaming across every provider, with fallback-before-first-chunk and clean [DONE] termination on error. |
Observability
| Feature | How it works |
|---|---|
| Live log stream | WebSocket stream. Every request logged with provider, model, latency, cost, status. |
| Prometheus metrics | /metrics endpoint (admin-protected). Scrape with Grafana, Datadog, Prometheus. |
| OpenTelemetry | Distributed traces exported to any OTLP backend (Jaeger, Grafana Tempo, Honeycomb). |
| Response headers | X-Summoned-Provider, X-Summoned-Cost-USD, X-Summoned-Latency-Ms, X-Summoned-Cache, X-Daily-Remaining on every response. |
API Reference
| Endpoint | Method | Auth | Description |
|---|---|---|---|
/v1/chat/completions |
POST | Bearer or x-provider-key |
OpenAI-compatible completion (streaming + tools) |
/v1/embeddings |
POST | Bearer |
Text embeddings |
/v1/models |
GET | — | List registered providers |
/v1/keys |
POST / GET / DELETE | x-admin-key |
API key management |
/admin/virtual-keys |
POST / GET / DELETE | x-admin-key |
Virtual key management |
/admin/logs |
GET | x-admin-key |
Request logs (buffer or DB) |
/admin/stats |
GET | x-admin-key |
Aggregated statistics |
/admin/providers |
GET | x-admin-key |
Provider health + circuit breaker state |
/console/api/* |
* | x-admin-key |
Admin API used by the web console (same surface as /admin) |
/metrics |
GET | x-admin-key |
Prometheus metrics |
/ws/logs |
WebSocket | ?key=ADMIN_KEY |
Real-time log streaming |
/health |
GET | — | Liveness check |
/health/ready |
GET | — | Readiness (Postgres + Redis) |
/console |
GET | ADMIN_API_KEY (in browser prompt) |
Built-in web console |
Configuration
See .env.example for the full reference.
Core
| Variable | Required | Default | Description |
|---|---|---|---|
ADMIN_API_KEY |
Yes | — | Master admin key (min 32 chars). openssl rand -hex 32 |
VIRTUAL_KEY_SECRET |
Recommended | Falls back to admin key | Encryption key for virtual keys. openssl rand -hex 32 |
GATEWAY_PORT |
No | 4000 |
Port to listen on (.env.example sets 4200) |
GATEWAY_REQUIRE_AUTH |
No | true |
Set false for trusted private networks |
PUBLIC_RPM_LIMIT |
No | 60 |
RPM cap for BYOK / unauthenticated callers |
POSTGRES_URL |
Optional | — | Enables managed keys, virtual keys, audit history |
REDIS_URL |
Optional | — | Enables cache, rate limits, latency EMA |
OTEL_EXPORTER_OTLP_ENDPOINT |
No | — | OpenTelemetry trace endpoint |
USD_INR_RATE |
No | 85 |
Exchange rate for INR cost display |
Provider credentials
Set only the ones you plan to use — unset providers are skipped at startup.
| Variable | Provider |
|---|---|
OPENAI_API_KEY |
OpenAI |
ANTHROPIC_API_KEY |
Anthropic |
GOOGLE_API_KEY |
Google Gemini |
GROQ_API_KEY |
Groq |
AZURE_OPENAI_API_KEY + AZURE_OPENAI_ENDPOINT |
Azure OpenAI |
AWS_ACCESS_KEY_ID / AWS_BEDROCK_API_KEY + AWS_REGION |
AWS Bedrock |
MISTRAL_API_KEY |
Mistral AI |
TOGETHER_API_KEY |
Together AI |
DEEPSEEK_API_KEY |
DeepSeek |
FIREWORKS_API_KEY |
Fireworks AI |
COHERE_API_KEY |
Cohere |
CEREBRAS_API_KEY |
Cerebras |
PERPLEXITY_API_KEY |
Perplexity |
XAI_API_KEY |
xAI / Grok |
OPENROUTER_API_KEY |
OpenRouter |
HUGGINGFACE_API_KEY |
HuggingFace |
DEEPINFRA_API_KEY |
DeepInfra |
HYPERBOLIC_API_KEY |
Hyperbolic |
SAMBANOVA_API_KEY |
SambaNova |
NOVITA_API_KEY |
Novita AI |
MOONSHOT_API_KEY |
Moonshot (Kimi) |
ZAI_API_KEY |
Z.AI (Zhipu / GLM) |
NVIDIA_API_KEY |
Nvidia NIM |
OLLAMA_BASE_URL |
Ollama (local) |
VLLM_BASE_URL (+ optional VLLM_API_KEY) |
vLLM (self-hosted) |
VOYAGE_API_KEY |
Voyage AI |
SARVAM_API_KEY 🇮🇳 |
Sarvam AI |
YOTTA_API_KEY 🇮🇳 |
Yotta Labs |
CUSTOM_PROVIDERS |
JSON array [{id,name,baseUrl,apiKey}] for any OpenAI-compatible endpoint |
Adding a New Provider
Takes ~10 lines of code and ~5 minutes if the provider speaks the OpenAI API format:
// src/providers/your-provider.ts
import { createOpenAICompatProvider } from "./openai-compat"
export function createYourProvider(apiKey: string) {
return createOpenAICompatProvider({
id: "yourprovider",
name: "Your Provider",
apiKey,
baseURL: "https://api.yourprovider.com/v1",
})
}Then add the env var in src/lib/env.ts, register it in src/index.ts, and optionally add pricing in src/lib/models/your-provider.ts. See CONTRIBUTING.md for a full walkthrough.
Development
make setup # Full setup: deps + Postgres + Redis + migrations + console
make dev # Gateway with hot reload
make dev-console # Console Vite dev server
make check-types # TypeScript type check
make migrate # Run DB migrations
make create-key # Quick-create an API key for testing
make help # All commandsRun the test suite:
bun test tests # ~45 tests across pricing, guardrails, fallback, config, circuit breakerSDKs
| Language | Package | Source |
|---|---|---|
| TypeScript / JavaScript | @summoned/ai |
summoned-sdk-ts |
| Python | summoned-ai (PyPI) |
summoned-sdk-python |
Contributing
The easiest way to contribute is to add a new LLM provider — it's ~10 lines of code and ~5 minutes. See CONTRIBUTING.md.
Bug report? Open an issue → Feature request? Start a discussion →
License
MIT — free to use, fork, modify, and self-host.
Built by Summoned Tech
Made with ♥ for developers who care about production AI infra