Package Exports

agentfootprint
agentfootprint/cache
agentfootprint/debug
agentfootprint/debug/finders
agentfootprint/embedders
agentfootprint/events
agentfootprint/identity
agentfootprint/injection-engine
agentfootprint/llm-providers
agentfootprint/locales
agentfootprint/memory
agentfootprint/memory-providers
agentfootprint/observability-providers
agentfootprint/observability/contextError/finders
agentfootprint/observe
agentfootprint/package.json
agentfootprint/reliability
agentfootprint/resilience
agentfootprint/security
agentfootprint/status
agentfootprint/strategies
agentfootprint/stream
agentfootprint/thinking
agentfootprint/tool-providers

Readme

Agentfootprint

Your agent gave an answer that looks right — and it's wrong.
The logs can't tell you who influenced it. Agentfootprint can.

The explainable AI agent framework for TypeScript: every read, write, decision, and tool call becomes connected evidence as your agent runs. When something goes wrong, you don't grep logs — you ask.

Build agents — skills, steering, RAG, memory, control flow — and debug them like nothing else.
Why is a query, not a guess.

_{A real run, replayed — rendered with AgentThinkingUI (npm i agentthinkingui). Every frame is generated from the run's own trace; ▶ watch it live.}

The new error class

For decades, software had two kinds of errors — and developers never needed deep domain knowledge to fix either:

Error class	Where the bug lives	How you find it
Infrastructure — crash, timeout, 500	the system	infra logs, monitoring
Business logic — wrong branch, wrong math	the code	stack trace, debugger, `console.log`
Contextual — wrong tool chosen, wrong fact believed, stale memory trusted	what the model was given	nothing. Until now.

Agents introduced the third class. The code is correct, the infra is healthy, the answer even reads well — and the run is still wrong, because something influenced the model:

The model…	because…
picked the wrong tool	two descriptions read nearly alike — it chose between twins
believed a wrong "fact"	a tool returned it, or an injected fact planted it
followed the wrong instruction	the wrong skill / steering fired — or fired one iteration too early
answered from the past	a previous turn or stale memory bled into this one

Classical logs can't explain any of it: they record what the code did, never what the context did. The debugging question changed — no longer "what did my code do?" but "who influenced the model?"

The idea

If contextual errors live in what the model was given, then the run itself must be structured so context is evidence — every injection, read, write, decision, and tool call recorded connected, the moment it happens. Not logs you grep. Evidence you ask.

Quick start — runs offline, no API key

npm install agentfootprint footprintjs

import { Agent, defineTool, mock } from 'agentfootprint';

const weather = defineTool({
  name: 'weather',
  description: 'Get current weather for a city.',
  inputSchema: {
    type: 'object',
    properties: { city: { type: 'string' } },
    required: ['city'],
  },
  execute: async ({ city }: { city: string }) => `${city}: 72°F, sunny`,
});

const agent = Agent.create({
  provider: mock({ reply: 'I checked: it is 72°F and sunny.' }),
  model: 'mock',
})
  .system('You answer weather questions using the weather tool.')
  .tool(weather)
  .build();

const result = await agent.run({ message: 'Weather in Paris?' });
console.log(result);  // → "I checked: it is 72°F and sunny."

For production, import a real provider from agentfootprint/llm-providers and swap it in — anthropic(...) / openai(...) / bedrock(...) / ollama(...). Only the import line changes; the agent code stays the same. (The vendor-SDK providers live on the agentfootprint/llm-providers subpath so the main agentfootprint barrel stays free of optional peer-dep requires; mock, browserAnthropic, and browserOpenai are on the main barrel.)

Then add context

A real agent carries more than one prompt and one tool: facts about the user, always-on rules, skills that unlock on demand. Declare each piece — the framework decides when it fires and which slot it lands in, and every piece is born tracked:

import { defineFact, defineSteering, defineSkill } from 'agentfootprint';

const agent = Agent.create({ provider, model })
  .system('You are a support agent.')
  .fact(defineFact({                    // data the model should know — always on
    id: 'user-profile',
    data: 'Name: Maya · Plan: Pro · Customer since 2022',
  }))
  .steering(defineSteering({            // rules the model must follow — always on
    id: 'refund-policy',
    prompt: 'Never promise a refund before checking the policy tool.',
  }))
  .skill(defineSkill({                  // guidance + tools — unlocks when the LLM asks
    id: 'billing',
    description: 'Use for refunds, charges, billing questions.',
    body: 'When handling billing: confirm identity first, then…',
    tools: [refundTool],
  }))
  .build();

Same shape for .instruction() / .memory() / .rag() / raw .injection() — they're all the one primitive, Injection = slot × trigger × cache. The full model ↓

Then compose control flow

One agent is a Runner. So is every composition of agents — four control-flow primitives, and anything that runs composes into anything else:

import { Sequence, Parallel, Conditional } from 'agentfootprint';

const pipeline = Sequence.create()
  .step('classify', classifyAgent)                  // sequence: step → step
  .step('review',
    Parallel.create()                               // parallel: fan out, then merge
      .branch('legal', legalAgent)
      .branch('ethics', ethicsAgent)
      .mergeWithLLM({ provider, model, prompt: 'Synthesize:' })
      .build())
  .step('respond',
    Conditional.create()                            // conditional: one branch runs
      .when('urgent', (i) => i.message.startsWith('URGENT'), urgentAgent)
      .otherwise('normal', normalAgent)
      .build())
  .build();

await pipeline.run({ message: 'URGENT: refund dispute on order #4411' });

The fourth primitive is Loop — Loop.repeat(agent).until(guard).times(5), with a mandatory budget guard. And the named patterns from the research literature ship pre-composed from the same four: selfConsistency · reflection · debate · mapReduce · tot · swarm. Because every composition is a flowchart, the structure you wrote is the structure you see in the UI — and the trace spans the whole pipeline, not one agent at a time. Designing systems of agents ↓

How — we abstract context engineering

Skills, steering, RAG, facts, memory, guardrails — every name for context does one thing: it injects into one of three LLM slots. So we abstracted the injection itself.

agentfootprint mascot composing context flavors (Skills, Steering, Guardrails, RAG, Tool APIs, Memory) into three structured LLM slots (system, messages, tools) — the central abstraction, visualized.

_{One primitive: Injection = slot × trigger × cache. Because the framework owns this point, every piece of context is born tracked — observability isn't wired up, it's a consequence of the abstraction. The full model ↓}

What tracking buys you

See it in 30 seconds — four questions logs can't answer, each answered by code in this repo from a real run:

Q: Why did the model pick refund_full instead of refund_partial?
A: margin 0.02 — ⚠ NARROW: the two tool descriptions read nearly identical
   (toolChoiceRecorder — and the catalog lint flags the pair before you ever run)

Q: Why was this loan declined?
A: decision ← [control: "DTI above the 0.43 affordability ceiling"] ← dti 0.52 ← monthlyDebt / income
   (decide() evidence + the causal slice — every hop is a real recorded edge)

Q: Which piece of context made the answer wrong?
A: CAUSAL: ablating fact 'vip-override' flipped the outcome in 3/3 seeded reruns
   (localizeContextBug — ranked proxies, counterfactual proof)

Q: Prove nobody edited this run's record.
A: verifyAuditBundle → valid: false, brokenAt: #16 — the tampered record, named
   (hash-chained audit export, offline verification)

And you don't have to read the trace yourself — the trace toolpack lets a debugger model do it: in one run it found a planted bug while reading 9.5% of the trace (guide).

And the watching costs the run nothing:

_{Your agent is the event loop: each stage feeds its trace events into a queue on the call stack; the dispatcher delivers them to your listeners in the idle beat — one beat behind, never on the hot path.}

One contextual error, walked end to end

The third question above, in full — every value below is the captured output of examples/observability/05-context-bisect.ts and 06-backtrack-trace.ts, runnable offline.

The bug. A refunds agent carries a poisoned customer-profile fact. It answers:

"Refund APPROVED: Dana Reyes holds VIP tier override status, so the 47-day-old order qualifies for a refund beyond the 30-day window."

The policy says 30 days. The logs look fine — the model was given bad context, and classical logging has no row for that.

The walk. Because context here is state, the decision backtracks like a variable: who read it, who wrote it, who let it in, where it was born —

ANSWER   "Refund APPROVED…"                                   ← the bug
READ     call-llm#40 assembled the system prompt              ← exactly what the model saw
LANDED   context#6 wrote systemPromptInjections               ← who mutated state
ALLOWED  trigger { kind: 'always' } — active every iteration  ← why it was let in
BORN     defineFact('vip-override-fact')                      ← who wrote it

That chain is the provable candidate set — everything that demonstrably reached the call, nothing else. Influence scoring then ranks inside it, and counterfactual ablation proves: removing vip-override-fact flips APPROVED → DECLINED in 3/3 seeded reruns (the benign fact and the lookup tool: 0/3). Scores are proxies; only ablation makes the causal claim.

Three interfaces, one per shape of bug — ship-a-default, bring-your-own:

interface	finds the culprit when it is…	confirm by
influence ranking (`scoreInfluence` + `rankingConfidence`)	present — orders suspects, says when it can't rank	—
ablation (`localizeContextBug`)	present — remove it, see the outcome flip	removal
missing-context finder (`findDroppedContext`)	absent — available but never reached the model (`available − sent`)	restoration

The third closes a gap the first two are blind to: a key instruction truncated out of the window has nothing to ablate, so you confirm by restoration. And when scoring is too flat to trust, rankingConfidence returns a shortlist to confirm rather than a confident, wrong #1. Guides: ranking-confidence · missing-context.

The same walk, visual. toBacktrackTrace() serializes the report into AgentThinkingUI's <BacktrackView> — the "why?" board, triggerable from any decision point (final answer, a mid-loop tool choice, a deterministic decide() rule):

The BacktrackView board: the wrong answer, the suspects with influence meters, the CAUSAL 3/3 ablation stamp on the planted fact, and the chain-of-custody rewind showing the exact system prompt the model saw with the culprit sentence highlighted.

_{The wrong answer, suspects with influence meters, the CAUSAL 3/3 ablation stamp on the planted fact, and the rewind showing the exact system prompt the model saw with the culprit highlighted — recorded state, not a reconstruction. Every id is a runtimeStageId a debugger LLM can drill via the trace toolpack. ▶ Try it live · run 06-backtrack-trace.ts offline.}

Pick your door

🔧 Building an agent?	🐛 Agent misbehaving?	🏛️ Need audit / compliance?
Typed agents with skills, steering, RAG, memory, guardrails — and the trace for free.	Lint your tool catalog in 5 minutes — works on any framework's tool list (plain JSON / MCP / OpenAI / Anthropic shapes). Then causal slices, context bisection, and the debugger-LLM toolpack.	Hash-chained, tamper-evident run records with an offline verifier — record-keeping in the EU-AI-Act shape.
→ Quick start · → Build ↓	→ Debug ↓ · → Tool-catalog lint · → Trace debugging	→ Audit ↓ · → Security guide

The model — what we abstract

You collect domain-specific data and instructions — Skills · Steering · Guardrails · RAG · Tool APIs · Memory, with more on the way. They all do one thing: inject into one of three slots (system, messages, tools). So we abstracted the injection itself.

The abstraction is three rules:

Three slots are fixed. system, messages, tools — the LLM API surface.
N flavors are open. You declare what you have. Tomorrow's flavor (few-shot, reflection, persona, A2A handoff…) plugs in the same way.
Rules decide where and when. You provide the rules. We collect your data, fire the right one, land it in the right slot at the right iteration.

That's the whole model: Injection = slot × trigger × cache.

Slot — which of the 3 LLM API regions the content lands in (system / messages / tools).
Trigger — when the content fires (see below).
Cache — how stable the content is across iterations. The framework places provider cache markers for you — stable content gets 80–90% cheaper prefixes.

The 4 triggers

Trigger	Flavor	Fires when	Illustration	Default slot
`always`	static	Every iteration	`.steering(defineSteering({ id, prompt: 'You are a triage agent…' }))`	`system`
`rule`	runtime — predicate	Your rule returns true	`.instruction(defineInstruction({ id, activeWhen: s => /price\|refund/.test(s.userQuery), prompt }))`	`system`
`on-tool-return`	runtime — lifecycle	After a specific tool returns	`.instruction(defineInstruction({ id, slot: 'messages', activeWhen, prompt: 'Cite source IDs.' }))`	`messages`
`llm-activated`	runtime — agent-driven	LLM calls `read_skill('id')`	`.skill(defineSkill({ id: 'refund-policy', description, body, viaToolName: 'read_skill' }))`	`messages` (body)

[!NOTE] The "Illustration" column shows the shape of each flavor — the typed builder methods (.steering / .instruction / .skill / .fact / .rag) take an Injection (or MemoryDefinition for .rag) produced by the matching defineSteering / defineInstruction / defineSkill / defineFact / defineRAG factory. Slot is a default, not a coupling — the same Skill can live in tools (schema only, discovered via read_skill), messages (body injected on activation), or system (baked into the prompt as steering).

3 slots × 4 triggers × N flavors = the entire context-engineering surface.

Why we chose this abstraction

The agent space has many credible primary abstractions:

Framework	What it abstracts
LangChain	Pipelines of composable components
LangGraph	State machines of nodes and edges
CrewAI · AutoGen	Crews of role-playing agents
Mastra · Genkit · Pydantic AI	Typed full-stack bundles
DSPy	Compiled prompts
Inngest AgentKit	Durable workflows

We didn't have to choose between them.

agentfootprint is built on footprintjs — the flowchart pattern for backend code. footprintjs gives us every one of those abstractions out of the box:

Capability	What footprintjs hands us
Composition	`Sequence` · `Parallel` · `Conditional` · `Loop`
State machines	The ReAct loop is a flowchart
Multi-agent crews	Compose Agents through control flow — no special class needed
Durable workflows	`pauseHere()` plus JSON-portable `resume()`
Typed observation	60+ events for free, because the framework owns the loop

So we used the budget those abstractions would have cost us to invest deeply in something they all leave to the developer: the injection loop.

[!IMPORTANT] We abstract context engineering — and hand back the trace. Live to develop · offline to monitor · detailed to improve.

🔧 Build — design your agent or system of agents

Two scales — same alphabet. Four control flows are the entire vocabulary.

	import { Sequence } from 'agentfootprint'; const flow = Sequence.create() .step('a', stageA) .step('b', stageB) .step('c', stageC) .build();
	import { Parallel } from 'agentfootprint'; const fan = Parallel.create() .branch('web', searchWeb) .branch('docs', searchDocs) .mergeWithFn(synthesizer) .build();
	import { Conditional } from 'agentfootprint'; const router = Conditional.create() .when('billing', s => /bill\|invoice\|refund/.test(s.message), billingAgent) .when('tech', s => /error\|bug\|crash/.test(s.message), techAgent) .otherwise('default', defaultAgent) .build();
	import { Loop } from 'agentfootprint'; const reflexion = Loop.create() .repeat(thinkAgent) .until(({ latestOutput }) => latestOutput.includes('DONE')) .build();

Inside one agent — Dynamic vs Classic ReAct

Classic ReAct vs Dynamic ReAct loop topology — same 5 stages (SystemPrompt, Messages, Tools, CallLLM, Route → ExecuteTools/Finalize), but the loop edge differs: Classic returns to CallLLM only (slots frozen at 12 tools every iteration), Dynamic returns to SystemPrompt (slots recompose, tools shrink from 1 to 5 as skills activate).

_{Same five stages; only the loop edge differs. Classic returns to CallLLM with slots frozen; Dynamic (agentfootprint) returns to SystemPrompt, so injections recompose the next prompt — also the prerequisite for per-iteration caching.}

Iteration	Classic ReAct	Dynamic ReAct (agentfootprint)
1	12 tools shown	1 tool (`read_skill`)
2	12 tools shown	5 tools (skill activated)
3	12 tools shown	5 tools

📖 Dynamic ReAct guide · Key concepts

Multi-agent — compose with the alphabet

Pick the flows that match your problem. Chain them. That's your Agentic Application.

const research = Loop.create()
  .repeat(Sequence.create().step('plan', plan).step('search', searchAll).build())
  .until(({ iteration, latestOutput }) => iteration >= 3 || latestOutput.includes('DONE'))
  .build();

Named patterns — also compositions of the same 4

The patterns the field knows reduce to the same alphabet:

Pattern	Composition
Swarm	`Loop( Parallel( Agent×N ) → merge )`
Tree-of-Thoughts	`Loop( Parallel( Agent×N ) → Conditional(score) )`
Reflexion	`Loop( Agent → Conditional(critique) → Agent )`
Debate	`Parallel( Agent_pro, Agent_con ) → Agent_judge`
Router	`Conditional → Agent_A \| Agent_B \| Agent_C`
Hierarchical	`Agent_planner → Sequence( Agent_worker×N ) → synth`

Same trick as the injection model: instead of N libraries for N patterns, we found the M building blocks all N patterns are made of.

📖 Compare: hand-rolled vs declarative · migration from LangChain / CrewAI / LangGraph

🐛 Debug — see what your agent did

A real agent run in the Lens: the conversation (with live PII redaction), the executed path lit on the merge-tree flowchart, the WHAT-HAPPENED timeline of every iteration/context/LLM turn/route, run stats, and the step inspector — all generated from the run's own trace.

_{One real run, fully explained — the Lens (npm i agentfootprint-lens): conversation · executed path · per-step timeline · stats, every pixel from the trace.}

Because we own the loop, every decision and execution is captured during traversal — not bolted on. The default capture is the causal trace: every stage, read, write, and decision evidence as a JSON-portable, scrubbable, queryable, exportable artifact — and every LLM call backtracks to four typed answers: what was injected, who triggered it (which rule), when it fired, how it landed (slot · position · cache). Beyond the default, wire custom recorders for cost, latency, or quality scoring — any observation hook fires on the same stream.

The same trace serves three downstream consumers — no extra instrumentation:

Consumer	What the trace gives you
Audit / compliance	Six months later, "why was loan #42 rejected?" answers from the chain (`creditScore=580 < 620 ∧ dti=0.6 > 0.43 → riskTier=high → REJECTED`) — no LLM call. GDPR Art. 22 / ECOA / EU AI Act adverse-action notices write themselves from the captured decision evidence.
Cheap-model triage	A Sonnet trace is good input for Haiku to answer follow-ups: ~200 tokens ($0.25/1M) vs ~2,500 at a reasoning model ($15/1M). Memoized thinking — no agent rerun.
Training data	Every successful chain is a labeled trajectory; SFT pairs fall out of the snapshot's history field. (The export wrapper, DPO, and process-RL need extra collection layers — roadmap.)

Four views, one trace — pick by question:

View	Shows	When to use
AgentThinkingUI (the hero up top)	The run replayed as an animated, scrubbable story — the brain, the tools, the reasoning	Show anyone what the agent did
BacktrackView (the board above)	A decision walked backwards — suspects, influence meters, ablation stamps, custody rewind	Answer why it decided that
Lens	Agent-centric — User/Agent[3 slots]/Tool flowchart with iteration scrubber and round commentary	Live debugging, "what did the agent see at step 5?"
Explainable Trace	Structural — subflow tree, full flowchart, memory inspector, per-stage execution timeline	Architecture review, root-cause analysis

And two conversational doors over the same evidence — ask instead of look:

// dedicated: a cheap model debugs an expensive run by id — pays for what it opens
const debuggerAi = traceDebugAgent({ artifacts, provider: anthropic(), model: 'claude-haiku-4-5' });
await debuggerAi.run({ message: 'Why was loan APP-7 approved?' });

// in-conversation: the agent answers "why did you…?" from its OWN previous turn
Agent.create({ provider, model }).tool(lookupOrder)
  .selfExplain({ delegate: { provider: anthropic(), model: 'claude-haiku-4-5' } })
  .build();

.selfExplain() mounts one skill: the catalog stays clean until the LLM activates it, evidence binds only to completed runs (never in-flight), and delegate answers at the cheap model's price inside the expensive conversation. Guide · examples 07 · 08 · the doors walk the same evidence the board visualizes ▶.

📖 Powered by footprintjs causalChain() — backward thin-slicing on the commit log. Causal memory deep dive · Explainability & compliance

One recording. Two lenses. Three consumers. Zero extra instrumentation.

Observers stay off the hot path

By default every agent.on() listener runs synchronously inside the producing statement. One option moves observation off the hot path:

Agent.create({ provider, model, observerDelivery: 'deferred' }) // default 'inline'
// serverless / shutdown: settle async listener work before the freeze
await agent.drainObservers({ timeoutMs: 5_000 });

Events are captured into a bounded queue (≈ microseconds on the hot path) and delivered one beat behind — same typed events, same order, zero loss, a throwing listener can't kill the run, and per-listener stats land on getLastSnapshot()?.observerStats to name the hog. Terminal boundaries (resolve, crash, pause) drain synchronously first, so checkpoints are always complete. Measured: −8% wall on a 50-iteration agent with a deliberately slow listener (example 21).

📖 Full semantics (capture policies, backpressure, overflow): deferred-observers guide

Lint your tool catalog — before the model picks the wrong twin

Tool routing is an LLM decision driven by names + descriptions — so lint the catalog like code and gate it in CI. Zero stack buy-in: works on any OpenAI / Anthropic / MCP / plain tool list, no agentfootprint runtime needed.

npx agentfootprint-lint-tools tools.json --threshold 0.94 --strict

✗ CONFUSABLE 0.9445  get_fcns_database <> influx_get_fcns_database
    hint: names differ only by 'influx' — make the descriptions say WHEN to choose each
~ warn  [enum-in-prose] influx_get_port_ranking.metric
    suggest: "enum": ["avg_iops","peak_iops","mbps"]

Pairwise confusability over what the model reads (embedder pluggable, content-hash cached) plus a pluggable structural rule pack (missing/short descriptions, says-WHAT-not-WHEN, enums hiding in prose, undocumented optional params). The runtime counterpart, toolChoiceRecorder (agentfootprint/observe), scores each live LLM call's tool choice against the same geometry and flags narrow margins and proxy disagreements — lazily, off the hot path.

agentfootprint owns the detection — the scores, the ties, the recorded margins; you own the policy — the threshold, the embedder, the rewrite. We map; you decide.

📖 Tool-catalog lint guide — 5 minutes from a tools.json to a gated CI check · examples/observability/02 · 03 · 04

🏛️ Audit — prove what happened

Answering "why was the loan rejected?" from captured evidence is the debug door above. The audit door adds the integrity layer: prove the record itself hasn't been edited since capture. auditExport() hash-chains every typed event — decisions, tool calls, validation rejections, permission verdicts, costs — into an append-only bundle (EU AI Act Art. 12 record-keeping shape); verifyAuditBundle() re-checks it offline — no agent, no LLM — and names the exact record any tamper broke.

import { auditExport, verifyAuditBundle } from 'agentfootprint/observability-providers';

const audit = auditExport({ agent: 'ledger-auditor' });
const stop = agent.enable.observability({ strategy: audit });
await agent.run({ message: 'audit account ACCT-1142' });
stop();

const bundle = audit.bundle();           // plain JSON — store anywhere
verifyAuditBundle(bundle);               // { valid: true, recordsChecked: 50 }
// flip one byte anywhere → { valid: false, brokenAt: 13, reason: 'hash mismatch — …' }

Payloads are PII-bounded by default (tool args as key names, results as a type, content as [N chars] markers). And it's honest about its limits: tamper-evident, not tamper-proof — for non-repudiation, anchor both chain ends in external storage (WORM store, signed log).

📖 Tamper-evident audit guide · examples/features/19-audit-export.ts — capture → verify → tamper → drain · 20-regulated-decisioning.ts — an offline auditor reconstructs a loan decline from persisted files, both chain ends anchored

Mocks first, production second

Build the entire app against in-memory mocks with zero API cost, then swap real infrastructure one boundary at a time.

Boundary	Dev	Prod
LLM provider	`mock(...)`	`anthropic()` · `openai()` · `bedrock()` · `ollama()`
Memory store	`InMemoryStore`	`RedisStore` · `AgentCoreStore`
MCP	`mockMcpClient(...)`	`mcpClient({ transport })`
Cache strategy	`NoOpCacheStrategy`	auto-selected per provider

The flowchart, recorders, and tests don't change between dev and prod.

What ships today

Full capability list — core · 7 providers · memory + adapters · operability · debugging & compliance · tooling

Core

2 primitives — LLMCall, Agent (the ReAct loop)
4 control flows — Sequence, Parallel, Conditional, Loop
1 Injection primitive — defineSkill / defineSteering / defineInstruction / defineFact
1 reliability gate — .reliability({ preCheck, postDecide, providers, circuitBreaker, fallback })
1 tool dispatch primitive — ToolProvider (sync OR async) — staticTools · gatedTools · skillScopedTools · or a custom ToolProvider that discovers over hubs / MCP / per-tenant catalogs

LLM providers (7)

Factory	Use for
`anthropic`	Claude (Sonnet, Opus, Haiku) via `@anthropic-ai/sdk`
`openai`	GPT-4o, GPT-4-turbo via `openai` SDK
`bedrock`	Claude / Titan / Mistral via AWS Bedrock runtime
`ollama`	Local models (OpenAI-compatible endpoint)
`browserAnthropic`	Browser-side Claude calls (no proxy server)
`browserOpenai`	Browser-side OpenAI calls (no proxy server)
`mock`	Deterministic dev/test (zero API cost)

Memory + adapters

Memory factory — 4 types (episodic / semantic / narrative / causal) × 7 strategies (window / budget / summarize / topK / extract / decay / hybrid)
Memory stores — InMemoryStore, RedisStore (peer-dep ioredis), AgentCoreStore (peer-dep AWS SDK)
RAG · MCP adapters — mockMcpClient(...) / mcpClient({ transport })

Operability

Provider-agnostic prompt caching — declarative per-injection, per-iteration marker recomputation
Pause / resume — JSON-serializable checkpoints; resume hours later on a different server
Resilience primitives — withRetry, withFallback, withCircuitBreaker, .outputFallback, agent.resumeOnError
60+ typed observability events — agent · composition · context · stream · tools · skill · memory · cache · cost · permission · eval · embedding · pause · error · fallback · resilience · reliability · risk

Debugging & compliance (agentfootprint/observe)

Tool-catalog lint — npx agentfootprint-lint-tools (any framework's tool list) + runtime toolChoiceRecorder margins
Contextual-bug localizer — localizeContextBug (causal slice → influence ranking → counterfactual ablation) + bisectCulprits
toBacktrackTrace — render any decision as the BacktrackView "why?" board
Trace toolpack — 6 bounded, LLM-callable tools so a debugger model walks the trace by id
traceDebugAgent (dedicated debugger session) · .selfExplain() (in-conversation why-questions, skill-gated, with a cheap-model delegate switch)
OTel GenAI span export · hash-chained tamper-evident audit bundles with an offline verifier

Tooling

AgentThinkingUI — animated run player + BacktrackView why-board (separate agentthinkingui package)
Lens · Explainable Trace — two visual replays of the causal trace (separate agentfootprint-lens package)
AI-coding-tool support — Claude Code · Cursor · Windsurf · Cline · Kiro · Copilot

📖 Agent API reference · CHANGELOG

Where to next

If you are...	Go here
New to agents	5-minute quick start
Coming from LangChain / CrewAI / LangGraph	Migration guide
Architecting an enterprise rollout	Production guide
Doing due diligence	Architecture overview
Researcher / academic background	Citations & prior art
Curious about design	Inspiration docs

Or jump into the examples gallery — every example is also an end-to-end CI test.

Tree-shakeable & ESM-first

Import one thing, ship one thing. agentfootprint is built so your bundle grows only with what you actually use:

Dual build, true ESM. Ships CommonJS (require) and real ECMAScript Modules (import) with TypeScript types. The ESM build is type:module with explicit .js import extensions, so it loads as true ESM under Node, Vite, Next, Deno, and Bun — no shims.
Per-file modules + honest sideEffects. The dist is emitted file-by-file (never pre-bundled), so bundlers drop every export you don't touch. A small import { defineTool } doesn't pull in the Agent runtime, injection engine, memory stores, or LLM providers.
Subpath exports + lazy peer-deps. Heavyweight integrations live behind their own subpaths and load their SDK only when you instantiate them — importing agentfootprint never bundles @anthropic-ai/sdk, ioredis, the AWS SDKs, or the MCP SDK unless you actually use that adapter.

Proven, not promised. A CI smoke test bundles a minimal import { defineTool } and asserts the Agent runtime, injection engine, memory stores, and providers are pruned; a second test loads the main barrel and every subpath as true ESM and verifies the lazy-adapter loader works under ESM (createRequire, not a bare require). See test/esm-packaging.test.ts.

Built on

footprintjs — the flowchart pattern for backend code. agentfootprint's decision-evidence capture, narrative recording, and time-travel checkpointing are footprintjs primitives at the runtime layer.

You don't need to learn footprintjs to use agentfootprint — but if you want to build your own primitives at this depth, start there.

JSPM

agentfootprint