JSPM

@summoned/gateway

0.1.0
  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 18
  • Score
    100M100P100Q72009F
  • License MIT

Open-source OpenAI-compatible AI gateway. 28 providers, routing, fallback, caching, guardrails, budgets, and a built-in console. Run in 15 seconds: npx @summoned/gateway

Package Exports

  • @summoned/gateway
  • @summoned/gateway/dist/cli.mjs

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (@summoned/gateway) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

Summoned AI Gateway

The open-source AI gateway built for India — and the world.

Route to 28+ LLM providers through one OpenAI-compatible API.

Quickstart · Providers · SDK · Features · Console · Contributing

MIT License npm @summoned/gateway npm @summoned/ai GitHub stars Node 18+ OpenAI Compatible India native


A lightweight, open-source AI gateway that sits between your app and every LLM provider. Bring your own provider API keys — Summoned adds intelligent routing, automatic failover, response caching, cost governance, guardrails, and a full self-hosted console on top.

No code changes in your app. Drop-in replacement for the OpenAI API.

Zero infra to start. Just set ADMIN_API_KEY + one provider key and run.

  • 28 providers, one API — OpenAI, Anthropic, Google, AWS Bedrock, Azure, Groq, Mistral, DeepSeek, Together, Fireworks, Cohere, Cerebras, Perplexity, xAI, OpenRouter, HuggingFace, DeepInfra, Hyperbolic, SambaNova, Novita, Moonshot, Z.AI, Nvidia NIM, Ollama, vLLM, Voyage + India-first Sarvam and Yotta.
  • Zero infra required — runs completely stateless. Add Redis for caching + rate limits. Add Postgres for audit history.
  • India-native — Sarvam AI, Yotta Labs, AWS Bedrock ap-south-1 (DPDP-compliant defaults), INR cost tracking.
  • Full console, self-hosted, free — dashboard, live logs, playground, cost analytics. No cloud subscription needed.
  • Guardrails in the free tier — PII blocking, content filters, regex rules. Competitors lock this behind enterprise.
  • Virtual keys — store provider credentials encrypted (AES-256-GCM). Callers reference a vk_... id; raw keys never leave the server.
  • Daily token budgets — hard caps per API key. Critical for agents. Free.
  • Official TypeScript SDK@summoned/ai on npm.

What can you do?

[!TIP] Starring this repo helps more developers discover the gateway 🙏

Star us on GitHub — it takes 2 seconds and means a lot.


Console

Console demo

The gateway ships with a built-in web console at /console — no separate app, no extra services.

Page What you get
Dashboard Requests, success rate, latency percentiles (p50/p95/p99), token volume, cost in USD + INR
Live Logs Real-time WebSocket stream of every request — filter by status or provider, click to expand
API Keys Create, list, and revoke sk-smnd-... keys from the browser
Virtual Keys Store provider credentials encrypted (AES-256-GCM) — callers use a vk_... ID
Providers Health status, circuit breaker state, avg latency per provider
Playground Send test completions through managed / virtual / BYOK auth modes — see cost, latency, cache status live

Access control: the console and its API (/console/api/*) are protected by ADMIN_API_KEY. On first visit you'll be prompted for the key; it's stored in localStorage and sent on every request. Clicking "Sign out" clears it.


Quickstart — pick one, gateway up in 15 seconds

🚀 npx (zero install)

ADMIN_API_KEY=$(openssl rand -hex 32) \
OPENAI_API_KEY=sk-... \
npx @summoned/gateway

Gateway → http://localhost:4000 · Console → http://localhost:4000/console

No clone, no Docker, no Bun — just Node 18+.

🐳 Docker (for production)

docker run -p 4000:4000 \
  -e ADMIN_API_KEY=$(openssl rand -hex 32) \
  -e OPENAI_API_KEY=sk-... \
  ghcr.io/summoned-tech/summoned-ai-gateway:latest

🛠 Full stack with Postgres + Redis (persistent logs + managed keys)

git clone https://github.com/summoned-tech/summoned-ai-gateway.git
cd summoned-ai-gateway

cp .env.example .env
# Edit .env — set ADMIN_API_KEY + POSTGRES_URL + REDIS_URL + provider keys

docker compose up -d

Or for local dev with hot reload:

make setup   # deps + Postgres + Redis + migrations + console build
make dev     # gateway with hot reload via Bun

What works without Postgres / Redis:

Feature No Postgres No Redis Both absent
Chat completions
Streaming
Guardrails
Fallback / circuit breaker
Response caching in-memory in-memory
Rate limiting in-memory in-memory
Managed API keys
Request history / analytics console
Virtual key encryption

2. Make your first request

Option A — pass your provider key directly (no gateway key needed):

curl http://localhost:4200/v1/chat/completions \
  -H "x-provider-key: sk-YOUR_OPENAI_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "openai/gpt-4o-mini", "messages": [{"role":"user","content":"Hello!"}]}'

Option B — use a gateway-managed key (recommended for teams):

# Create a key
curl -X POST http://localhost:4200/v1/keys \
  -H "x-admin-key: YOUR_ADMIN_KEY" \
  -H "Content-Type: application/json" \
  -d '{"name": "my-app", "tenantId": "team-a"}'

# Use it
curl http://localhost:4200/v1/chat/completions \
  -H "Authorization: Bearer sk-smnd-YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "openai/gpt-4o-mini", "messages": [{"role":"user","content":"Hello!"}]}'

3. Add gateway features

Control retries, fallbacks, caching, routing, and guardrails per request via the x-summoned-config header (or the SDK's config field):

import json, base64
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:4200/v1",
    api_key="sk-smnd-...",
)

config = {
    "retry":    { "attempts": 3, "backoff": "exponential" },
    "fallback": ["anthropic/claude-haiku-4", "groq/llama-3.3-70b-versatile"],
    "cache":    True,
    "routing":  "cost",   # cheapest provider first
    "guardrails": {
        "input": [{ "type": "pii", "deny": True }]
    }
}

response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Summarize this contract"}],
    extra_headers={
        "x-summoned-config": base64.b64encode(json.dumps(config).encode()).decode()
    }
)

Works with any OpenAI-compatible library — LangChain, LlamaIndex, CrewAI, Autogen, Vercel AI SDK and more.


Use from your code

The official TypeScript SDK ships on npm. It's a thin typed wrapper around the OpenAI-compatible surface, plus first-class support for the config object (retry / fallback / cache / guardrails / virtual keys).

npm install @summoned/ai
import { Summoned } from "@summoned/ai"

const client = new Summoned({
  apiKey: "sk-smnd-...",
  baseURL: "http://localhost:4200",
})

const res = await client.chat.completions.create({
  model: "openai/gpt-4o",
  messages: [{ role: "user", content: "Hello" }],
  config: {
    cache: true,
    fallback: ["anthropic/claude-sonnet-4-20250514", "groq/llama-3.3-70b-versatile"],
    routing: "cost",
  },
})

console.log(res.choices[0].message.content)
console.log(res.summoned) // { provider, cost, latency_ms, ... }

Streaming:

for await (const chunk of await client.chat.completions.create({
  model: "openai/gpt-4o",
  messages: [{ role: "user", content: "Write a haiku" }],
  stream: true,
})) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "")
}

Prefer the OpenAI SDK? Point it at the gateway:

import OpenAI from "openai"
const openai = new OpenAI({
  baseURL: "http://localhost:4200/v1",
  apiKey: "sk-smnd-...",
})

Package: @summoned/ai · ESM-only · Node 18+ · ~13 KB tarball.


Supported Providers

28 providers out of the box. Every provider listed here is enabled by setting its env var — unset ones are simply skipped at startup, so you only pay for the surface you use.

Provider Model format Example Env var
OpenAI openai/<model> openai/gpt-4o OPENAI_API_KEY
Anthropic anthropic/<model> anthropic/claude-sonnet-4-20250514 ANTHROPIC_API_KEY
Google Gemini google/<model> google/gemini-2.0-flash GOOGLE_API_KEY
AWS Bedrock bedrock/<model> bedrock/amazon.nova-pro-v1:0 AWS creds / AWS_BEDROCK_API_KEY
Azure OpenAI azure/<deployment> azure/gpt-4o AZURE_OPENAI_API_KEY + AZURE_OPENAI_ENDPOINT
Groq groq/<model> groq/llama-3.3-70b-versatile GROQ_API_KEY
Mistral AI mistral/<model> mistral/mistral-large-latest MISTRAL_API_KEY
Together AI together/<model> together/meta-llama/Llama-3.3-70B-Instruct-Turbo TOGETHER_API_KEY
DeepSeek deepseek/<model> deepseek/deepseek-chat DEEPSEEK_API_KEY
Fireworks AI fireworks/<model> fireworks/accounts/fireworks/models/llama-v3p1-70b-instruct FIREWORKS_API_KEY
Cohere cohere/<model> cohere/command-r-plus COHERE_API_KEY
Cerebras cerebras/<model> cerebras/llama3.1-70b CEREBRAS_API_KEY
Perplexity perplexity/<model> perplexity/llama-3.1-sonar-large-128k-online PERPLEXITY_API_KEY
xAI (Grok) xai/<model> xai/grok-3 XAI_API_KEY
OpenRouter openrouter/<upstream>/<model> openrouter/openai/gpt-4o OPENROUTER_API_KEY
HuggingFace huggingface/<model> huggingface/meta-llama/Llama-3.3-70B-Instruct HUGGINGFACE_API_KEY
DeepInfra deepinfra/<model> deepinfra/meta-llama/Meta-Llama-3.1-70B-Instruct DEEPINFRA_API_KEY
Hyperbolic hyperbolic/<model> hyperbolic/deepseek-ai/DeepSeek-V3 HYPERBOLIC_API_KEY
SambaNova sambanova/<model> sambanova/Meta-Llama-3.1-405B-Instruct SAMBANOVA_API_KEY
Novita AI novita/<model> novita/meta-llama/llama-3.1-70b-instruct NOVITA_API_KEY
Moonshot (Kimi) moonshot/<model> moonshot/moonshot-v1-128k MOONSHOT_API_KEY
Z.AI (GLM) zai/<model> zai/glm-4.5 ZAI_API_KEY
Nvidia NIM nvidia/<model> nvidia/meta/llama-3.1-405b-instruct NVIDIA_API_KEY
Ollama ollama/<model> ollama/llama3.2 OLLAMA_BASE_URL (local, no key)
vLLM vllm/<model> vllm/meta-llama/Llama-3-70B-Instruct VLLM_BASE_URL (+ optional VLLM_API_KEY)
Voyage AI voyage/<model> voyage/voyage-3-large VOYAGE_API_KEY (embeddings / rerank)
Sarvam AI 🇮🇳 sarvam/<model> sarvam/sarvam-2b-v0.5 SARVAM_API_KEY
Yotta Labs 🇮🇳 yotta/<model> yotta/yotta-mini YOTTA_API_KEY

Pure proxy — no static model catalog. Any model the upstream provider accepts works immediately, zero config changes when new models launch.

Any OpenAI-compatible provider — use CUSTOM_PROVIDERS to add any private endpoint in JSON config. No code changes needed.


Works with any framework

// @summoned/ai (official)
import { Summoned } from "@summoned/ai"
const client = new Summoned({ apiKey: "sk-smnd-...", baseURL: "http://localhost:4200" })

// OpenAI SDK
import OpenAI from "openai"
const openai = new OpenAI({ baseURL: "http://localhost:4200/v1", apiKey: "sk-smnd-..." })

// Vercel AI SDK
import { createOpenAI } from "@ai-sdk/openai"
const openai = createOpenAI({ baseURL: "http://localhost:4200/v1", apiKey: "sk-smnd-..." })
# LangChain
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(base_url="http://localhost:4200/v1", api_key="sk-smnd-...")

# LlamaIndex
from llama_index.llms.openai import OpenAI
llm = OpenAI(base_url="http://localhost:4200/v1", api_key="sk-smnd-...")

# CrewAI, Autogen — just set OPENAI_BASE_URL=http://localhost:4200/v1

Core Features

Reliability

Feature How it works
Automatic retries Exponential or linear backoff. Configurable attempts per request.
Fallback models Specify alternate provider/model slugs. Gateway tries them in order on failure.
Circuit breaker Per-provider. Opens after 5 consecutive failures, retries after 30s in HALF_OPEN state.
Request timeouts Per-request timeout with automatic cancellation and graceful SSE termination.

Intelligent Routing

Strategy How it works
"routing": "cost" Sorts model chain by input token price — cheapest first.
"routing": "latency" Sorts by observed Exponential Moving Average latency per provider (stored in Redis).
"routing": "default" Uses the order you specified in fallback_models.

Cost Governance

Feature How it works
Daily token budget (TPD) Hard cap on inputTokens + outputTokens per API key per day. Enforced atomically in Redis. Returns 429 BUDGET_EXCEEDED when exceeded. Auto-resets at midnight.
Per-key rate limiting Requests per minute (RPM) sliding window per sk-smnd-... key. IP-based for BYOK callers.
Cost tracking Per-request cost in USD and INR. In response headers, live logs, and dashboard. Unknown-model costs are flagged priceUnknown: true rather than silently reported as $0.

Security

Feature How it works
API key auth SHA-256 hashed sk-smnd-... keys. Redis-cached for fast lookups.
Virtual key encryption Provider credentials stored with AES-256-GCM via HKDF. Callers reference vk_... ID. Cache invalidated immediately on revoke.
Guardrails Block PII (email, phone, SSN, Aadhaar, credit card), blocked words, regex, length — on input and output.
Timing-safe auth Admin + API key comparison is constant-time. Timing attack resistant.
Body size limit Requests over 4 MB rejected with 413.
Security headers X-Content-Type-Options, X-Frame-Options, Referrer-Policy, HSTS (in production).
Admin brute-force protection 20 req/min per IP on admin + console endpoints.
Console lockdown /console/api/* requires x-admin-key on every request; CORS scoped to /v1 and /health only.
BYOK mode Pass provider key via x-provider-key header. No gateway key required. IP-rate-limited.

Performance

Feature How it works
Response caching Redis-backed (in-memory fallback). Cache key = SHA-256 of (model + messages + params). Identical requests served instantly.
Full streaming SSE streaming across every provider, with fallback-before-first-chunk and clean [DONE] termination on error.

Observability

Feature How it works
Live log stream WebSocket stream. Every request logged with provider, model, latency, cost, status.
Prometheus metrics /metrics endpoint (admin-protected). Scrape with Grafana, Datadog, Prometheus.
OpenTelemetry Distributed traces exported to any OTLP backend (Jaeger, Grafana Tempo, Honeycomb).
Response headers X-Summoned-Provider, X-Summoned-Cost-USD, X-Summoned-Latency-Ms, X-Summoned-Cache, X-Daily-Remaining on every response.

API Reference

Endpoint Method Auth Description
/v1/chat/completions POST Bearer or x-provider-key OpenAI-compatible completion (streaming + tools)
/v1/embeddings POST Bearer Text embeddings
/v1/models GET List registered providers
/v1/keys POST / GET / DELETE x-admin-key API key management
/admin/virtual-keys POST / GET / DELETE x-admin-key Virtual key management
/admin/logs GET x-admin-key Request logs (buffer or DB)
/admin/stats GET x-admin-key Aggregated statistics
/admin/providers GET x-admin-key Provider health + circuit breaker state
/console/api/* * x-admin-key Admin API used by the web console (same surface as /admin)
/metrics GET x-admin-key Prometheus metrics
/ws/logs WebSocket ?key=ADMIN_KEY Real-time log streaming
/health GET Liveness check
/health/ready GET Readiness (Postgres + Redis)
/console GET ADMIN_API_KEY (in browser prompt) Built-in web console

Configuration

See .env.example for the full reference.

Core

Variable Required Default Description
ADMIN_API_KEY Yes Master admin key (min 32 chars). openssl rand -hex 32
VIRTUAL_KEY_SECRET Recommended Falls back to admin key Encryption key for virtual keys. openssl rand -hex 32
GATEWAY_PORT No 4000 Port to listen on (.env.example sets 4200)
GATEWAY_REQUIRE_AUTH No true Set false for trusted private networks
PUBLIC_RPM_LIMIT No 60 RPM cap for BYOK / unauthenticated callers
POSTGRES_URL Optional Enables managed keys, virtual keys, audit history
REDIS_URL Optional Enables cache, rate limits, latency EMA
OTEL_EXPORTER_OTLP_ENDPOINT No OpenTelemetry trace endpoint
USD_INR_RATE No 85 Exchange rate for INR cost display

Provider credentials

Set only the ones you plan to use — unset providers are skipped at startup.

Variable Provider
OPENAI_API_KEY OpenAI
ANTHROPIC_API_KEY Anthropic
GOOGLE_API_KEY Google Gemini
GROQ_API_KEY Groq
AZURE_OPENAI_API_KEY + AZURE_OPENAI_ENDPOINT Azure OpenAI
AWS_ACCESS_KEY_ID / AWS_BEDROCK_API_KEY + AWS_REGION AWS Bedrock
MISTRAL_API_KEY Mistral AI
TOGETHER_API_KEY Together AI
DEEPSEEK_API_KEY DeepSeek
FIREWORKS_API_KEY Fireworks AI
COHERE_API_KEY Cohere
CEREBRAS_API_KEY Cerebras
PERPLEXITY_API_KEY Perplexity
XAI_API_KEY xAI / Grok
OPENROUTER_API_KEY OpenRouter
HUGGINGFACE_API_KEY HuggingFace
DEEPINFRA_API_KEY DeepInfra
HYPERBOLIC_API_KEY Hyperbolic
SAMBANOVA_API_KEY SambaNova
NOVITA_API_KEY Novita AI
MOONSHOT_API_KEY Moonshot (Kimi)
ZAI_API_KEY Z.AI (Zhipu / GLM)
NVIDIA_API_KEY Nvidia NIM
OLLAMA_BASE_URL Ollama (local)
VLLM_BASE_URL (+ optional VLLM_API_KEY) vLLM (self-hosted)
VOYAGE_API_KEY Voyage AI
SARVAM_API_KEY 🇮🇳 Sarvam AI
YOTTA_API_KEY 🇮🇳 Yotta Labs
CUSTOM_PROVIDERS JSON array [{id,name,baseUrl,apiKey}] for any OpenAI-compatible endpoint

Adding a New Provider

Takes ~10 lines of code and ~5 minutes if the provider speaks the OpenAI API format:

// src/providers/your-provider.ts
import { createOpenAICompatProvider } from "./openai-compat"

export function createYourProvider(apiKey: string) {
  return createOpenAICompatProvider({
    id: "yourprovider",
    name: "Your Provider",
    apiKey,
    baseURL: "https://api.yourprovider.com/v1",
  })
}

Then add the env var in src/lib/env.ts, register it in src/index.ts, and optionally add pricing in src/lib/models/your-provider.ts. See CONTRIBUTING.md for a full walkthrough.


Development

make setup          # Full setup: deps + Postgres + Redis + migrations + console
make dev            # Gateway with hot reload
make dev-console    # Console Vite dev server
make check-types    # TypeScript type check
make migrate        # Run DB migrations
make create-key     # Quick-create an API key for testing
make help           # All commands

Run the test suite:

bun test tests      # ~45 tests across pricing, guardrails, fallback, config, circuit breaker

SDKs

Language Package Source
TypeScript / JavaScript @summoned/ai summoned-sdk-ts
Python summoned-ai (PyPI) summoned-sdk-python

Contributing

The easiest way to contribute is to add a new LLM provider — it's ~10 lines of code and ~5 minutes. See CONTRIBUTING.md.

Bug report? Open an issue → Feature request? Start a discussion →


License

MIT — free to use, fork, modify, and self-host.


Built by Summoned Tech
Made with ♥ for developers who care about production AI infra