Package Exports
- agentary-js
Readme
Agentary JS
A JavaScript SDK for running quantized small language models in the browser using WebGPU and WebAssembly, with built-in support for agentic workflows
🚀 Features
- Browser-Native: Run small language models directly in the browser without server dependencies
- WebGPU Acceleration: Leverage WebGPU for high-performance inference when available
- Quantized Models: Support for efficient quantized models (Q4, Q8, etc.) for optimal performance
- Streaming Generation: Real-time token streaming with Time to First Byte (TTFB) metrics
- Function Calling: Built-in support for tool/function calling capabilities
- Multi-Model Support: Use different models for chat, tool use and reasoning.
- Agentic Workflows: Create and execute complex multi-step agent workflows with conditional logic
- Tool Integration: Register custom tools for agents to use during workflow execution
📦 Installation
npm install agentary-js
🎯 Quick Start
Basic Text Generation
import { createSession } from 'agentary-js';
// Create a session with a quantized model
const session = await createSession({
models: {
chat: {
name: 'onnx-community/gemma-3-270m-it-ONNX',
quantization: 'q4'
}
},
engine: 'webgpu' // or 'wasm'
});
// Generate text with streaming
for await (const chunk of session.createResponse({
messages: [{ role: 'user', content: 'Hello, how are you today?' }]
})) {
if (chunk.isFirst && chunk.ttfbMs) {
console.log(`Time to first byte: ${chunk.ttfbMs}ms`);
}
if (!chunk.isLast) {
process.stdout.write(chunk.token);
}
}
// Clean up resources
await session.dispose();
Tool Calling
import { createSession } from 'agentary-js';
const session = await createSession({
models: {
chat: {
name: 'onnx-community/gemma-3-270m-it-ONNX',
quantization: 'q4'
},
tool_use: {
name: 'onnx-community/gemma-3-270m-it-ONNX',
quantization: 'q4'
}
}
});
const tools = [
{
type: "function",
function: {
name: "get_weather",
description: "Get current weather for a city",
parameters: {
type: "object",
properties: {
city: { type: "string", description: "City name" }
},
required: ["city"]
}
}
}
];
for await (const chunk of session.createResponse({
messages: [{ role: 'user', content: 'What is the weather in New York?' }],
tools
})) {
process.stdout.write(chunk.token);
}
Agentic Workflows
Create multi-step agent workflows that can think, reason, and take actions autonomously.
import { createAgentSession } from 'agentary-js';
// Create an agent session with specialized models
const agent = await createAgentSession({
models: {
chat: {
name: 'onnx-community/gemma-3-270m-it-ONNX',
quantization: 'q4'
},
tool_use: {
name: 'onnx-community/Qwen2.5-0.5B-Instruct',
quantization: 'q4'
},
default: {
name: 'onnx-community/Qwen2.5-0.5B-Instruct',
quantization: 'q4'
}
},
});
// Define a research workflow
const researchWorkflow = {
id: 'research-assistant',
name: 'Research Assistant Workflow',
systemPrompt: 'You are a helpful research assistant.',
maxIterations: 5,
timeout: 30000,
memoryConfig: {
enablePruning: true
}
steps: [
{
id: 1,
prompt: 'Understand and break down the research topic',
maxTokens: 200,
temperature: 0.7,
generationTask: 'reasoning'
},
{
id: 2,
prompt: 'Search for relevant information using available tools',
toolChoice: ['web_search'],
maxTokens: 300,
generationTask: 'tool_use'
},
{
id: 3,
prompt: 'Analyze the gathered information for insights',
maxTokens: 400,
temperature: 0.8,
generationTask: 'reasoning'
},
{
id: 4,
prompt: 'Provide a comprehensive summary and recommendations',
maxTokens: 500,
generationTask: 'chat'
}
],
tools: [
{
type: 'function',
function: {
name: 'web_search',
description: 'Search the web for information',
parameters: {
type: 'object',
properties: {
query: { type: 'string', description: 'Search query' }
},
required: ['query']
},
implementation: async (query) => {
// Your search implementation
return `Search results for: ${query}`;
}
}
}
]
};
// Execute the workflow
console.log('🤖 Starting research workflow...\n');
for await (const iteration of agent.runWorkflow('Research the benefits of renewable energy', researchWorkflow)) {
if (iteration?.content) {
console.log(`[Step ${iteration.stepId}]: ${iteration?.content}`);
}
if (iteration?.toolCall) {
console.log(` 🔧 Tool: ${iteration.toolCall.name}(${JSON.stringify(iteration.toolCall.args)})`);
if (iteration.toolCall.result) {
console.log(` 📄 Result: ${iteration.toolCall.result}`);
}
}
if (iteration?.error) {
console.log(` ❌ Error: ${iteration.error.message}`);
}
console.log(''); // Empty line for readability
}
await agent.dispose();
🏗️ API Reference
createSession(args: CreateSessionArgs): Promise<Session>
Creates a new inference session with the specified configuration.
createAgentSession(args: CreateSessionArgs): Promise<AgentSession>
Creates a new agent session with workflow capabilities, extending the basic session with agentic features.
CreateSessionArgs
Property | Type | Description |
---|---|---|
models |
object |
Model configuration for different tasks (optional) |
models.default |
Model |
Default model configuration (optional) |
models.tool_use |
Model |
Model for tool/function calling |
models.chat |
Model |
Model for chat/text generation |
models.reasoning |
Model |
Model for reasoning tasks |
engine |
DeviceType |
Inference engine - 'auto', 'webgpu', 'wasm', 'webnn' (optional) |
hfToken |
string |
Hugging Face token for private models (optional) |
ctx |
number |
Context length override (optional) |
Session
createResponse(args: GenerateArgs, generationTask?: GenerationTask): AsyncIterable<TokenStreamChunk>
Generates text with streaming output using the specified generation task.
GenerateArgs
Property | Type | Description |
---|---|---|
messages |
Message[] |
Array of conversation messages |
model |
Model |
Override model for this generation (optional) |
max_new_tokens |
number |
Maximum number of tokens to generate |
tools |
Tool[] |
Function calling tools (optional) |
temperature |
number |
Sampling temperature (0.0-2.0) |
top_p |
number |
Nucleus sampling parameter |
top_k |
number |
Top-k sampling parameter |
repetition_penalty |
number |
Repetition penalty (default: 1.1) |
stop |
string[] |
Stop sequences |
seed |
number |
Random seed for reproducible output |
deterministic |
boolean |
Use deterministic generation |
GenerationTask
Generation task: 'chat' \| 'tool_use' \| 'reasoning'
Message
Property | Type | Description |
---|---|---|
role |
'user' | 'assistant' | 'system' |
Role of the message sender |
content |
string |
The message content |
Model
Property | Type | Description |
---|---|---|
name |
string |
Model identifier (e.g., HuggingFace model name) |
quantization |
DataType |
Model quantization level |
TokenStreamChunk
Property | Type | Description |
---|---|---|
token |
string |
Generated token text |
tokenId |
number |
Token ID |
isFirst |
boolean |
Whether this is the first token |
isLast |
boolean |
Whether this is the last token |
ttfbMs |
number |
Time to first byte in milliseconds (optional) |
tokensPerSecond |
number |
Tokens per second rate (optional) |
dispose(): Promise<void>
Cleans up the session and releases all resources.
AgentSession
Extends Session
with additional methods for workflow execution and tool management.
runWorkflow(prompt: string, workflow: AgentWorkflow): AsyncIterable<WorkflowStep>
Executes a multi-step agent workflow with the given prompt and workflow definition.
registerTool(tool: Tool): void
Registers a custom tool for use in workflows and generation.
getRegisteredTools(): Tool[]
Returns all currently registered tools.
WorkflowStep
Property | Type | Description |
---|---|---|
id |
string |
Unique identifier for the step |
description |
string |
Short description of the step for persistent agent memory |
prompt |
string |
The prompt or instruction for this step |
maxTokens |
number |
Maximum tokens to generate (optional) |
temperature |
number |
Temperature for generation (optional) |
generationTask |
GenerationTask |
Type of generation task (optional) |
toolChoice |
string[] |
Available tools for this step (optional) |
maxAttempts |
number |
Maximum retry attempts on failure (optional, default: 1) |
WorkflowIterationResponse
Property | Type | Description |
---|---|---|
stepId |
string |
ID of the workflow step (optional) |
error |
WorkflowStepError |
Error details if step failed (optional) |
content |
string |
Generated content (optional) |
toolCall |
object |
Tool call information (optional) |
toolCall.name |
string |
Name of the called tool (optional) |
toolCall.args |
Record<string, any> |
Arguments passed to the tool (optional) |
toolCall.result |
string |
Tool execution result (optional) |
metadata |
Record<string, any> |
Additional step metadata (optional) |
WorkflowStepError
Property | Type | Description |
---|---|---|
message |
string |
Error message describing the step failure |
AgentWorkflow
Property | Type | Description |
---|---|---|
id |
string |
Unique workflow identifier |
name |
string |
Human-readable workflow name |
description |
string |
Workflow description (optional) |
systemPrompt |
string |
System prompt for the workflow (optional) |
steps |
WorkflowStep[] |
Array of workflow steps |
context |
Record<string, any> |
Initial workflow context data (optional) |
tools |
Tool[] |
Tools available to the workflow |
timeout |
number |
Workflow timeout in milliseconds (optional) |
maxIterations |
number |
Maximum number of iterations (optional) |
memoryConfig |
MemoryConfig |
Memory management configuration (optional) - see Memory System |
MemoryConfig
Configure advanced memory management features for workflows to optimize token usage and performance. See the Memory System section for comprehensive documentation.
Property | Type | Description |
---|---|---|
memory |
Memory |
Storage strategy implementation (optional, default: SlidingWindowMemory ) |
formatter |
MemoryFormatter |
Message formatter (optional, default: DefaultMemoryFormatter ) |
memoryCompressor |
MemoryCompressor |
Compression strategy (optional, default: none) |
maxTokens |
number |
Maximum token limit for workflow memory (optional, default: 1024) |
compressionThreshold |
number |
Percentage (0-1) of maxTokens to trigger compression (optional, default: 0.8) |
preserveMessageTypes |
string[] |
Message types to never compress (optional) |
autoCompress |
boolean |
Auto-compress when adding messages (optional) |
checkpointInterval |
number |
Checkpoint frequency for rollback support (optional) |
Tool
Property | Type | Description |
---|---|---|
type |
'function' |
Tool type (currently only 'function') |
function |
object |
Function definition |
function.name |
string |
Function name |
function.description |
string |
Function description |
function.parameters |
object |
JSON Schema for parameters |
function.parameters.type |
'object' |
Parameter schema type |
function.parameters.properties |
Record<string, any> |
Parameter properties |
function.parameters.required |
string[] |
Required parameter names |
function.implementation |
Function |
JavaScript implementation (optional) |
Additional Types
EngineKind
Supported inference engines:
type EngineKind = 'auto' | 'webgpu' | 'wasm' | 'webnn';
WorkerInstance
Internal worker instance type representing the Web Worker handling model inference.
InitArgs
Worker initialization arguments including model configuration and runtime settings.
MessageContent
Message content type that can be either a string or structured content with metadata.
WorkflowIterationResponse
Enhanced response type for workflow iterations:
Property | Type | Description |
---|---|---|
stepId |
string |
ID of the current step (optional) |
error |
WorkflowStepError |
Error information if step failed (optional) |
content |
string |
Generated content (optional) |
toolCall |
object |
Tool execution details (optional) |
metadata |
Record<string, any> |
Additional metadata (optional) |
Memory System Types
The following types and classes are exported for memory system customization. See the Memory System section for detailed documentation.
Classes:
MemoryManager
- Main memory management classSlidingWindowMemory
- Sliding window memory implementationLLMSummarization
- LLM-based memory compressionDefaultMemoryFormatter
- Default message formatter
Types:
Memory
- Memory storage interfaceMemoryFormatter
- Message formatting interfaceMemoryCompressor
- Memory compression interfaceMemoryMessage
- Message with metadataMemoryConfig
- Memory configuration optionsMemoryMetrics
- Memory usage metricsRetrievalOptions
- Message retrieval optionsCompressionOptions
- Compression configurationToolResult
- Tool execution result formatLLMSummarizationConfig
- LLM summarization configuration
🌐 Browser Support
- WebGPU: Chrome 113+, Edge 113+, Firefox with WebGPU enabled
- WebAssembly: All modern browsers
- Minimum Requirements: 4GB RAM recommended for small models
🔧 Development
Building from Source
# Clone the repository
git clone https://github.com/agentary-ai/agentary-js.git
cd agentary-js
# Install dependencies
npm install
# Build the library
npm run build
# Watch for changes during development
npm run dev
Project Structure
src/
├── index.ts # Main library exports
├── core/ # Core session functionality
│ ├── session.ts # Basic session management
│ ├── agent-session.ts # Agent workflow session
│ └── index.ts # Core exports
├── workers/ # Worker management
│ ├── manager.ts # Worker lifecycle management
│ ├── worker.ts # Web Worker for model inference
│ └── index.ts # Worker exports
├── workflow/ # Workflow execution engine
│ ├── executor.ts # Workflow execution logic
│ ├── step-executor.ts # Individual step execution
│ ├── result-builder.ts # Workflow result construction
│ ├── workflow-state.ts # Workflow state management
│ └── index.ts # Workflow exports
├── processing/ # Content and tool processing
│ ├── content/
│ │ ├── processor.ts # Content processing utilities
│ │ └── index.ts
│ ├── tools/
│ │ ├── parser.ts # Main tool call parser
│ │ ├── parsers/
│ │ │ ├── xml-parser.ts # XML format tool parser
│ │ │ ├── json-parser.ts # JSON format tool parser
│ │ │ ├── function-parser.ts # Function call parser
│ │ │ ├── composite-parser.ts # Combined parsing strategy
│ │ │ └── index.ts
│ │ └── index.ts
│ └── index.ts # Processing exports
├── types/ # Type definitions
│ ├── agent-session.ts # Agent session types
│ ├── session.ts # Basic session types
│ ├── worker.ts # Internal worker types
│ └── workflow-state.ts # Workflow state types
└── utils/ # Utility modules
├── logger.ts # Logging utilities
├── logger-config.ts # Logger configuration
└── token-counter.ts # Token counting utilities
Running the Examples
# Build the library
npm run build
# Serve the examples (requires a local server)
cd examples
npx http-server . -c-1
# Open the demo in your browser:
# http://localhost:8080/demo.html
Available Examples
examples/demo.html
- Interactive demonstration with two main sections:🔧 Agent Workflow Tab: Advanced agent workflows with step-by-step execution
- Math Problem Solver workflow with calculator tool
- Demonstrates think → act → respond step pattern
- Real-time step visualization and tool call tracking
- Pre-loaded with sample math problems for testing
💬 Direct Chat Tab: Basic text generation and function calling
- Simple prompt-response interaction
- Optional tool integration with configurable JSON tools
- Pre-configured weather tool example
- Hugging Face token support for private models
- Streaming token generation with TTFB metrics
examples/weather-planner-demo.html
- Advanced Weather Activity Planning Demo:🌦️ Weather-Based Activity Planning: Intelligent activity recommendations based on weather conditions
- Multi-step workflow with geocoding, weather forecasting, and POI search
- Dynamic indoor/outdoor activity selection based on weather conditions
- Budget-aware filtering and distance-based recommendations
- Calendar event generation with time slots
Features:
- Location input with geocoding support
- Date and time window configuration
- Budget preferences (free, cheap, any)
- Activity type preferences
- Real-time workflow execution visualization
- Comprehensive result display with itinerary
🧠 Agent Workflow Patterns
Step Types
Agent workflows support four main step types:
think
: Analysis, reasoning, and planning stepsact
: Action steps that use tools to interact with external systemsdecide
: Decision-making steps with conditional logicrespond
: Final response or summary generation
Best Practices
1. Tool Design
- Keep tools focused and single-purpose
- Include comprehensive parameter schemas
- Handle errors gracefully in implementations
- Use descriptive names and descriptions
2. Workflow Structure
- Limit workflows to 3-7 steps for optimal performance
- Use clear, descriptive step IDs and descriptions
- Set reasonable timeouts and iteration limits
- Plan for error scenarios
3. Memory Management
Configure agent memory to optimize performance and token usage. See the Memory System section for comprehensive documentation.
Quick Memory Configuration Examples
import { SlidingWindowMemory, LLMSummarization, DefaultMemoryFormatter } from 'agentary-js';
// For long-running workflows with many steps
const longWorkflowMemory = {
memory: new SlidingWindowMemory(),
memoryCompressor: new LLMSummarization({
maxSummaryTokens: 512,
recentWindowSize: 4
}),
maxTokens: 2048,
compressionThreshold: 0.75
};
// For simple sequential workflows
const simpleWorkflowMemory = {
memory: new SlidingWindowMemory(),
maxTokens: 1024,
compressionThreshold: 0.9 // Less aggressive compression
};
// For workflows with custom formatting
const customFormattedMemory = {
memory: new SlidingWindowMemory(),
formatter: new DefaultMemoryFormatter({
stepInstructionTemplate: '## Step {stepId}\n{prompt}',
includeMetadata: true
}),
maxTokens: 1536
};
Best Practices
- Token Management: Set
maxTokens
to 20-30% of your model's context window - Memory Implementation: Use
SlidingWindowMemory
for most use cases - Compression: Use
LLMSummarization
for workflows with >5 steps that reference early context - Formatting: Customize
DefaultMemoryFormatter
templates for your domain - Monitoring: Use
memoryManager.getMetrics()
to track memory usage
4. Model Selection
- Use reasoning models for analysis and inference
- Use tool_use models for tool-specific workflows
- Consider using specialized models for domain-specific tasks
🧠 Memory System
The memory system provides a flexible, plugin-based architecture for managing agent memory during workflow execution. It allows you to customize how messages are stored, retrieved, formatted, and compressed to optimize performance and token usage.
Architecture
The memory system consists of three main components managed by the MemoryManager
:
- Memory - How messages are stored and retrieved (e.g.,
SlidingWindowMemory
) - Memory Formatter - How messages are formatted for the LLM (e.g.,
DefaultMemoryFormatter
) - Memory Compressor - How memory is compressed when it grows too large (e.g.,
LLMSummarization
)
MemoryManager
├── Memory (storage & retrieval)
│ └── SlidingWindowMemory
│ └── Your custom implementation
├── MemoryFormatter (formatting)
│ └── DefaultMemoryFormatter
│ └── Your custom formatter
└── MemoryCompressor (compression)
└── LLMSummarization
└── Your custom compressor
Quick Start with Memory
Using Default Configuration
The system works out of the box with sensible defaults:
import { createAgentSession } from 'agentary-js';
const agent = await createAgentSession({
models: {
chat: {
name: 'onnx-community/gemma-3-270m-it-ONNX',
quantization: 'q4'
}
}
});
const workflow = {
id: 'my-workflow',
systemPrompt: 'You are a helpful assistant.',
maxIterations: 10,
steps: [/* your steps */],
tools: []
};
for await (const result of agent.runWorkflow('Help me plan my day', workflow)) {
console.log(result);
}
Customizing Memory Configuration
Configure memory with custom strategies:
import {
createAgentSession,
SlidingWindowMemory,
LLMSummarization,
DefaultMemoryFormatter
} from 'agentary-js';
const workflow = {
id: 'my-workflow',
systemPrompt: 'You are a helpful assistant.',
maxIterations: 10,
memoryConfig: {
memory: new SlidingWindowMemory(),
formatter: new DefaultMemoryFormatter({
stepInstructionTemplate: '**Task {stepId}:** {prompt}',
toolResultsTemplate: '**Available Data:**\n{results}'
}),
memoryCompressor: new LLMSummarization({
systemPrompt: 'Create a concise summary focusing on key decisions.',
maxSummaryTokens: 1024
}),
maxTokens: 4096,
compressionThreshold: 0.75 // Compress at 75% capacity
},
steps: [/* your steps */],
tools: []
};
Built-in Memory Implementations
SlidingWindowMemory
Keeps the most recent messages within a token limit. Automatically prunes old messages when approaching the limit.
import { SlidingWindowMemory } from 'agentary-js';
const memory = new SlidingWindowMemory();
Features:
- Automatic pruning based on token limits
- Preserves system and summary messages
- Checkpoint/rollback support
- Fast and efficient
Configuration in workflow:
memoryConfig: {
memory: new SlidingWindowMemory(),
maxTokens: 4096,
compressionThreshold: 0.8
}
LLMSummarization
Uses an LLM to intelligently summarize conversation history into a concise format.
import { LLMSummarization } from 'agentary-js';
const compressor = new LLMSummarization({
systemPrompt: 'Summarize the conversation focusing on key facts and decisions.',
userPromptTemplate: 'Summarize:\n{messages}',
temperature: 0.1,
maxSummaryTokens: 512,
recentWindowSize: 4, // Keep last 4 messages unsummarized
minMessagesToSummarize: 6 // Require at least 6 messages
});
Features:
- Intelligent summarization preserving context
- Customizable prompts and templates
- Configurable output length
- Preserves recent messages
Configuration in workflow:
memoryConfig: {
memoryCompressor: new LLMSummarization({
systemPrompt: 'Focus on key decisions and outcomes.',
maxSummaryTokens: 1024,
recentWindowSize: 4
}),
compressionThreshold: 0.8
}
DefaultMemoryFormatter
Formats messages and context for LLM consumption with customizable templates.
import { DefaultMemoryFormatter } from 'agentary-js';
const formatter = new DefaultMemoryFormatter({
stepInstructionTemplate: '**Step {stepId}:** {prompt}',
toolResultsTemplate: '**Tool Results:**\n{results}',
systemPromptTemplate: '{basePrompt}\n\n{context}',
includeMetadata: false // Don't include message type labels
});
Configuration in workflow:
memoryConfig: {
formatter: new DefaultMemoryFormatter({
stepInstructionTemplate: '## Task: {stepId}\n{prompt}',
includeMetadata: true
})
}
Memory Configuration Options
The MemoryConfig
interface provides comprehensive configuration:
interface MemoryConfig {
memory?: Memory; // Storage strategy
formatter?: MemoryFormatter; // Message formatter
memoryCompressor?: MemoryCompressor; // Compression strategy
maxTokens?: number; // Max tokens (default: 1024)
compressionThreshold?: number; // 0-1, trigger at % of max (default: 0.8)
preserveMessageTypes?: string[]; // Types to never compress
autoCompress?: boolean; // Auto-compress on add
checkpointInterval?: number; // Checkpoint frequency
}
Creating Custom Implementations
Custom Memory Implementation
import type {
Memory,
MemoryMessage,
MemoryMetrics,
RetrievalOptions
} from 'agentary-js';
class VectorDBMemory implements Memory {
name = 'vector-db';
private db: YourVectorDB;
constructor(connectionString: string) {
this.db = new YourVectorDB(connectionString);
}
async add(messages: MemoryMessage[]): Promise<void> {
// Store messages in vector DB with embeddings
for (const msg of messages) {
const embedding = await this.generateEmbedding(msg.content);
await this.db.insert({
content: msg.content,
role: msg.role,
embedding,
metadata: msg.metadata
});
}
}
async retrieve(options?: RetrievalOptions): Promise<MemoryMessage[]> {
// Retrieve semantically relevant messages
if (options?.relevanceQuery) {
const queryEmbedding = await this.generateEmbedding(options.relevanceQuery);
return await this.db.similaritySearch(queryEmbedding, options.maxTokens);
}
// Or retrieve recent messages
return await this.db.getRecent(options?.maxTokens || 2048);
}
getMetrics(): MemoryMetrics {
return {
messageCount: this.db.count(),
estimatedTokens: this.db.totalTokens(),
compressionCount: 0,
lastCompressionTime: undefined
};
}
clear(): void {
this.db.clear();
}
private async generateEmbedding(text: string): Promise<number[]> {
// Your embedding logic
return [];
}
}
// Use it in your workflow
const workflow = {
memoryConfig: {
memory: new VectorDBMemory('mongodb://localhost:27017'),
maxTokens: 8192
},
// ...
};
Custom Memory Compressor
import type {
MemoryCompressor,
MemoryMessage,
MemoryMetrics,
MemoryConfig
} from 'agentary-js';
class HybridCompressor implements MemoryCompressor {
name = 'hybrid';
async compress(
messages: MemoryMessage[],
targetTokens: number
): Promise<MemoryMessage[]> {
// First, keep high-priority messages
const highPriority = messages.filter(m =>
m.metadata?.priority && m.metadata.priority > 5
);
// Then, summarize the rest if still over budget
const remaining = messages.filter(m => !highPriority.includes(m));
if (this.estimateTokens(remaining) > targetTokens * 0.5) {
const summary = await this.summarize(remaining);
return [...highPriority, summary];
}
return [...highPriority, ...remaining];
}
shouldCompress(metrics: MemoryMetrics, config: MemoryConfig): boolean {
return metrics.estimatedTokens > (config.maxTokens || 2048) * 0.8;
}
private async summarize(messages: MemoryMessage[]): Promise<MemoryMessage> {
// Your summarization logic
return {
role: 'assistant',
content: 'Summary of previous conversation...',
metadata: { type: 'summary', timestamp: Date.now() }
};
}
private estimateTokens(messages: MemoryMessage[]): number {
return messages.reduce((sum, m) => sum + (m.metadata?.tokenCount || 0), 0);
}
}
Custom Formatter
import type { MemoryFormatter, MemoryMessage, ToolResult } from 'agentary-js';
import type { Message } from 'agentary-js';
class MarkdownFormatter implements MemoryFormatter {
formatMessages(messages: MemoryMessage[]): Message[] {
return messages.map(m => ({
role: m.role,
content: this.formatAsMarkdown(m)
}));
}
formatToolResults(results: Record<string, ToolResult>): string {
const entries = Object.values(results);
if (entries.length === 0) return '';
return '## Available Data\n\n' +
entries.map(r => `### ${r.name}\n${r.description}\n\`\`\`json\n${r.result}\n\`\`\``).join('\n\n');
}
formatStepInstruction(stepId: string, prompt: string): string {
return `## Task: ${stepId}\n\n${prompt}`;
}
formatSystemPrompt(basePrompt: string, context?: string): string {
let prompt = `# System Instructions\n\n${basePrompt}`;
if (context) {
prompt += `\n\n${context}`;
}
return prompt;
}
private formatAsMarkdown(message: MemoryMessage): string {
const timestamp = message.metadata?.timestamp
? new Date(message.metadata.timestamp).toISOString()
: '';
const type = message.metadata?.type || message.role;
return `**[${type}]** ${timestamp ? `_${timestamp}_` : ''}\n${message.content}`;
}
}
Using MemoryManager Directly
You can use MemoryManager
directly outside of workflows:
import { MemoryManager, SlidingWindowMemory, LLMSummarization } from 'agentary-js';
const memoryManager = new MemoryManager(session, {
memory: new SlidingWindowMemory(),
memoryCompressor: new LLMSummarization(),
maxTokens: 4096,
compressionThreshold: 0.75
});
// Add messages
await memoryManager.addMessages([
{ role: 'user', content: 'Hello!' },
{ role: 'assistant', content: 'Hi! How can I help?' }
]);
// Retrieve messages
const messages = await memoryManager.getMessages();
// Get metrics
const metrics = memoryManager.getMetrics();
console.log(`Messages: ${metrics.messageCount}, Tokens: ${metrics.estimatedTokens}`);
// Create checkpoint
memoryManager.createCheckpoint('before-operation');
// Rollback if needed
memoryManager.rollbackToCheckpoint('before-operation');
// Clear all memory
memoryManager.clear();
Advanced Memory Features
Checkpoints and Rollback
// Create checkpoint before risky operation
memoryManager.createCheckpoint('before-tool-call');
// ... perform operation ...
// Rollback if needed
if (operationFailed) {
memoryManager.rollbackToCheckpoint('before-tool-call');
}
Filtered Retrieval
The Memory
interface supports filtered retrieval:
// Retrieve only specific message types
const systemMessages = await memory.retrieve({
includeTypes: ['system_instruction', 'summary']
});
// Retrieve messages since a timestamp
const recentMessages = await memory.retrieve({
sinceTimestamp: Date.now() - 3600000 // Last hour
});
// Retrieve with token limit
const limitedMessages = await memory.retrieve({
maxTokens: 1024
});
Message Metadata
Messages include rich metadata for smarter retrieval:
const message = {
role: 'assistant',
content: 'Important decision: We should proceed with option A.',
metadata: {
timestamp: Date.now(),
stepId: 'decision-step',
priority: 10, // High priority
type: 'assistant',
tokenCount: 15
}
};
Common Memory Patterns
Pattern 1: Simple Chat Agent
const workflow = {
id: 'chat-agent',
memoryConfig: {
memory: new SlidingWindowMemory(),
maxTokens: 2048
},
// ...
};
Pattern 2: Long-Running Agent with Summarization
const workflow = {
id: 'long-running-agent',
memoryConfig: {
memory: new SlidingWindowMemory(),
memoryCompressor: new LLMSummarization({
systemPrompt: 'Summarize focusing on decisions and outcomes.',
maxSummaryTokens: 512,
recentWindowSize: 4
}),
maxTokens: 4096,
compressionThreshold: 0.75
},
// ...
};
Pattern 3: Multi-Step Workflow with Custom Formatting
const workflow = {
id: 'multi-step-workflow',
memoryConfig: {
memory: new SlidingWindowMemory(),
formatter: new DefaultMemoryFormatter({
stepInstructionTemplate: '### Step {stepId}\n{prompt}',
toolResultsTemplate: '## Results\n{results}'
}),
maxTokens: 4096
},
// ...
};
Memory Best Practices
Choose the right memory implementation:
- Use
SlidingWindowMemory
for most applications - Use semantic search/vector DB for RAG-style applications
- Use custom implementations for specific requirements
- Use
Set appropriate token limits:
- Leave headroom for your prompts and outputs
- Monitor
MemoryMetrics
to tune limits - Consider your model's context window
Customize formatters for your domain:
- Use clear, consistent formatting
- Include relevant context in templates
- Test different formats to find what works best
Test compression strategies:
- Ensure summaries preserve critical information
- Balance compression ratio vs. context preservation
- Monitor compression frequency
Use metadata effectively:
- Tag important messages with high priority
- Use timestamps for temporal filtering
- Use custom types for domain-specific filtering
Leverage checkpoints:
- Create checkpoints before risky operations
- Use rollback to recover from errors
- Clean up old checkpoints periodically
Advanced Workflow Features
Step Retry and Error Handling
Steps can automatically retry on failure:
const workflow = {
steps: [
{
id: 'api-call',
description: 'Call external API',
prompt: 'Fetch user data',
generationTask: 'tool_use',
maxAttempts: 3, // Retry up to 3 times on failure
toolChoice: ['fetch_user_data']
}
]
};
Memory Management
Optimize token usage with memory configuration:
const workflow = {
memoryConfig: {
enableMessagePruning: true, // Auto-prune old messages
enableMessageSummarization: true, // Summarize conversation history
maxMemoryTokens: 1024 // Set token limit
},
steps: [/* ... */]
};
Workflow Timeout and Validation
Set execution limits:
const workflow = {
timeout: 30000, // 30 second timeout
maxIterations: 10, // Maximum workflow iterations
steps: [/* ... */]
};
📡 Lifecycle Events
Agentary.js provides a comprehensive event system that allows you to monitor and react to internal operations in real-time. You can subscribe to events for worker initialization, generation progress, tool execution, workflow steps, and more.
Quick Start
import { createSession } from 'agentary-js';
const session = await createSession({
models: {
chat: { name: 'onnx-community/gemma-3-270m-it-ONNX', quantization: 'q4' }
}
});
// Subscribe to all events
session.on('*', (event) => {
console.log(`Event: ${event.type}`, event);
});
// Subscribe to specific event types
session.on('worker:init:complete', (event) => {
console.log(`Model loaded: ${event.modelName} in ${event.duration}ms`);
});
session.on('generation:token', (event) => {
if (event.ttfbMs) {
console.log(`TTFB: ${event.ttfbMs}ms`);
}
process.stdout.write(event.token);
});
// Unsubscribe when done
const unsubscribe = session.on('generation:complete', (event) => {
console.log(`Generated ${event.totalTokens} tokens in ${event.duration}ms`);
console.log(`Speed: ${event.tokensPerSecond?.toFixed(2)} tokens/sec`);
});
// Later: unsubscribe();
Event Categories
Worker Lifecycle Events
Monitor model loading and worker initialization:
// Worker initialization started
session.on('worker:init:start', (event) => {
console.log(`Loading model: ${event.modelName}`);
});
// Worker initialization progress (if supported by the model)
session.on('worker:init:progress', (event) => {
console.log(`Progress: ${event.progress}% - ${event.stage}`);
});
// Worker initialization complete
session.on('worker:init:complete', (event) => {
console.log(`Model ready: ${event.modelName} (${event.duration}ms)`);
});
// Worker disposed
session.on('worker:disposed', (event) => {
console.log(`Worker disposed: ${event.modelName}`);
});
Generation Events
Track text generation in real-time:
// Generation started
session.on('generation:start', (event) => {
console.log(`Starting generation with ${event.messageCount} messages`);
});
// Each token generated
session.on('generation:token', (event) => {
if (event.isFirst) {
console.log(`First token in ${event.ttfbMs}ms`);
}
if (!event.isLast) {
process.stdout.write(event.token);
}
});
// Generation complete
session.on('generation:complete', (event) => {
console.log(`\nGenerated ${event.totalTokens} tokens`);
console.log(`Speed: ${event.tokensPerSecond} tok/s`);
});
// Generation error
session.on('generation:error', (event) => {
console.error(`Generation failed: ${event.error}`);
});
Tool Events
Monitor tool execution:
// Tool call started
session.on('tool:call:start', (event) => {
console.log(`Calling ${event.toolName}:`, event.args);
});
// Tool call completed
session.on('tool:call:complete', (event) => {
console.log(`${event.toolName} completed in ${event.duration}ms`);
console.log('Result:', event.result);
});
// Tool call failed
session.on('tool:call:error', (event) => {
console.error(`${event.toolName} failed: ${event.error}`);
});
Workflow Events
Track workflow execution and step progress:
const agent = await createAgentSession({...});
// Workflow started
agent.on('workflow:start', (event) => {
console.log(`Starting workflow: ${event.workflowName}`);
console.log(`Steps: ${event.stepCount}`);
});
// Step started
agent.on('workflow:step:start', (event) => {
console.log(`\n[Step ${event.stepId}] ${event.stepDescription}`);
console.log(`Iteration: ${event.iteration}`);
});
// Step completed
agent.on('workflow:step:complete', (event) => {
const status = event.success ? '✓' : '✗';
console.log(`${status} Step ${event.stepId} (${event.duration}ms)`);
if (event.hasToolCall) {
console.log(' - Tool was called');
}
});
// Step retry
agent.on('workflow:step:retry', (event) => {
console.log(`Retrying step ${event.stepId}`);
console.log(`Attempt ${event.attempt}/${event.maxAttempts}`);
console.log(`Reason: ${event.reason}`);
});
// Workflow complete
agent.on('workflow:complete', (event) => {
console.log(`\nWorkflow complete!`);
console.log(`Completed ${event.totalSteps} steps in ${event.duration}ms`);
});
// Workflow timeout
agent.on('workflow:timeout', (event) => {
console.warn(`Workflow timeout at step ${event.stepId}`);
});
// Workflow error
agent.on('workflow:error', (event) => {
console.error(`Workflow failed: ${event.error}`);
});
Memory Events
Monitor memory operations (when using memory system):
// Memory checkpoint created
agent.on('memory:checkpoint', (event) => {
console.log(`Checkpoint: ${event.checkpointId}`);
console.log(`Messages: ${event.messageCount}, Tokens: ${event.estimatedTokens}`);
});
// Memory rolled back
agent.on('memory:rollback', (event) => {
console.log(`Rolled back to: ${event.checkpointId}`);
});
// Memory compressed
agent.on('memory:compressed', (event) => {
console.log(`Memory compressed: ${event.beforeTokens} → ${event.afterTokens}`);
console.log(`Ratio: ${(event.compressionRatio * 100).toFixed(1)}%`);
});
// Memory pruned
agent.on('memory:pruned', (event) => {
console.log(`Pruned ${event.messagesPruned} messages`);
console.log(`Freed ${event.tokensFreed} tokens`);
});
Event Types Reference
All events include a timestamp
field (milliseconds since epoch) and a type
field identifying the event.
Worker Events
worker:init:start
- Model initialization startedworker:init:progress
- Initialization progress updateworker:init:complete
- Model ready for inferenceworker:disposed
- Worker terminated
Generation Events
generation:start
- Text generation startedgeneration:token
- Token generatedgeneration:complete
- Generation finishedgeneration:error
- Generation failed
Tool Events
tool:call:start
- Tool execution startedtool:call:complete
- Tool execution succeededtool:call:error
- Tool execution failed
Workflow Events
workflow:start
- Workflow execution startedworkflow:step:start
- Step execution startedworkflow:step:complete
- Step execution finishedworkflow:step:retry
- Step retry attemptworkflow:complete
- Workflow finished successfullyworkflow:timeout
- Workflow exceeded timeoutworkflow:error
- Workflow failed
Memory Events
memory:checkpoint
- Memory checkpoint createdmemory:rollback
- Memory rolled back to checkpointmemory:compressed
- Memory compressedmemory:pruned
- Old messages pruned
Advanced Usage
Filtering Events
// Only listen to workflow events
agent.on('workflow:start', handleWorkflowStart);
agent.on('workflow:complete', handleWorkflowComplete);
agent.on('workflow:error', handleWorkflowError);
// Build a progress UI
agent.on('workflow:step:start', (event) => {
updateProgressBar(event.stepId, event.iteration);
});
agent.on('workflow:step:complete', (event) => {
markStepComplete(event.stepId, event.success);
});
Building Dashboards
const metrics = {
tokensGenerated: 0,
toolCalls: 0,
errors: 0,
avgGenerationTime: []
};
session.on('generation:complete', (event) => {
metrics.tokensGenerated += event.totalTokens;
metrics.avgGenerationTime.push(event.duration);
updateDashboard(metrics);
});
session.on('tool:call:complete', (event) => {
metrics.toolCalls++;
updateDashboard(metrics);
});
session.on('*:error', (event) => {
metrics.errors++;
updateDashboard(metrics);
});
Error Handling
session.on('generation:error', (event) => {
logger.error('Generation failed', {
requestId: event.requestId,
error: event.error
});
// Retry logic, user notification, etc.
});
agent.on('tool:call:error', (event) => {
console.error(`Tool ${event.toolName} failed:`, event.error);
// Fallback logic
});
agent.on('workflow:error', (event) => {
console.error(`Workflow ${event.workflowId} failed at step ${event.stepId}`);
// Cleanup, rollback, or notification
});
Performance Monitoring
const perfMonitor = {
ttfb: [],
throughput: [],
stepDurations: {}
};
session.on('generation:token', (event) => {
if (event.ttfbMs) {
perfMonitor.ttfb.push(event.ttfbMs);
}
});
session.on('generation:complete', (event) => {
if (event.tokensPerSecond) {
perfMonitor.throughput.push(event.tokensPerSecond);
}
});
agent.on('workflow:step:complete', (event) => {
if (!perfMonitor.stepDurations[event.stepId]) {
perfMonitor.stepDurations[event.stepId] = [];
}
perfMonitor.stepDurations[event.stepId].push(event.duration);
});
// Analyze performance
function analyzePerformance() {
const avgTTFB = perfMonitor.ttfb.reduce((a, b) => a + b, 0) / perfMonitor.ttfb.length;
const avgThroughput = perfMonitor.throughput.reduce((a, b) => a + b, 0) / perfMonitor.throughput.length;
console.log(`Average TTFB: ${avgTTFB.toFixed(2)}ms`);
console.log(`Average throughput: ${avgThroughput.toFixed(2)} tok/s`);
}
TypeScript Support
All event types are fully typed for TypeScript users:
import type {
SessionEvent,
WorkerInitCompleteEvent,
GenerationTokenEvent,
ToolCallStartEvent,
WorkflowStepCompleteEvent
} from 'agentary-js';
// Type-safe event handlers
session.on('worker:init:complete', (event: WorkerInitCompleteEvent) => {
console.log(event.modelName, event.duration); // Autocomplete works!
});
// Handle multiple event types
function handleEvent(event: SessionEvent) {
switch (event.type) {
case 'worker:init:complete':
console.log('Model loaded:', event.modelName);
break;
case 'generation:token':
process.stdout.write(event.token);
break;
case 'tool:call:start':
console.log('Tool call:', event.toolName);
break;
}
}
session.on('*', handleEvent);
Best Practices
- Unsubscribe when done: Always call the returned unsubscribe function to prevent memory leaks
const unsubscribe = session.on('generation:token', handler);
// Later:
unsubscribe();
- Use wildcards sparingly: The
*
wildcard subscribes to all events, which can be noisy
// Good for debugging
session.on('*', (e) => console.log(e.type));
// Better for production
session.on('generation:error', handleError);
session.on('workflow:error', handleError);
- Handle errors gracefully: Event handlers should not throw errors
session.on('generation:token', (event) => {
try {
processToken(event.token);
} catch (error) {
console.error('Error processing token:', error);
}
});
- Don't block event handlers: Keep event handlers fast and non-blocking
// Bad: Blocking operation
session.on('tool:call:complete', (event) => {
syncExpensiveOperation(event.result); // Blocks event loop
});
// Good: Async operation
session.on('tool:call:complete', async (event) => {
await asyncOperation(event.result); // Non-blocking
});
🔍 Logging & Debugging
Agentary.js includes a comprehensive logging system for debugging and monitoring your AI applications.
Basic Logging Usage
import { logger, LogLevel, isDebuggingMode } from 'agentary-js';
// Use predefined category loggers
logger.session.info('Session created successfully'); [[memory:6381875]]
logger.worker.debug('Processing generation request', { prompt: 'Hello' });
logger.agent.warn('Step timeout approaching', { stepId: 'step-1' });
// Or use custom categories
logger.info('custom-category', 'Custom message', { data: 'example' });
// Check if debugging mode is enabled
if (isDebuggingMode()) {
logger.worker.debug('Detailed debug info', { workerState: state });
}
Category Loggers
Agentary.js provides predefined category loggers for different parts of the system:
logger.session
- Session lifecycle and managementlogger.worker
- Worker communication and model inference [[memory:6381875]]logger.agent
- Agent workflow executionlogger.workflow
- Workflow state and step executionlogger.tools
- Tool parsing and execution
Configuration
Environment-based Configuration
The logger automatically configures itself based on the environment:
- Production: WARN level, no colors, minimal context
- Development: DEBUG level, colors enabled, full context
- Testing: ERROR level only
Manual Configuration
import { createLogger, LogLevel, LogConfigs } from 'agentary-js';
// Use a pre-defined config
const logger = createLogger(LogConfigs.debugging);
// Or create custom configuration
const logger = createLogger({
level: LogLevel.INFO,
enableColors: true,
enableTimestamps: true,
maxLogHistory: 500,
customFormatters: {
'my-category': (entry) => `🎯 ${entry.message}`
}
});
Browser Configuration
Set log level via URL parameter or localStorage:
// Via URL: http://localhost:3000?logLevel=debug
// Via localStorage:
localStorage.setItem('agentary_log_level', 'debug');
// Enable enhanced debugging mode
import { enableDebuggingMode } from 'agentary-js';
enableDebuggingMode();
Node.js Configuration
Set via environment variable:
AGENTARY_LOG_LEVEL=debug node your-app.js
Log Levels
- DEBUG: Detailed information for debugging (worker init, step execution, etc.)
- INFO: General information (session creation, workflow completion)
- WARN: Warning conditions (timeouts, retries)
- ERROR: Error conditions (failures, exceptions)
- SILENT: No logging
Structured Logging
All logs include structured data for better filtering and analysis:
logger.worker.info('Generation completed', {
model: 'gemma-3-270m',
tokensGenerated: 156,
duration: 1240
}, requestId);
// Outputs: [2024-01-15T10:30:45.123Z] [INFO] [worker] [req:abc123] Generation completed {"model":"gemma-3-270m","tokensGenerated":156,"duration":1240}
Debug Features
Export Logs
import { logger } from 'agentary-js';
// Get log history
const logs = logger.getLogHistory();
// Export as text
const logText = logger.exportLogs();
console.log(logText);
// Clear history
logger.clearHistory();
Custom Formatters
import { createLogger } from 'agentary-js';
const logger = createLogger({
customFormatters: {
'performance': (entry) => {
if (entry.data?.duration) {
return `⚡ PERF: ${entry.message} (${entry.data.duration}ms)`;
}
return `⚡ PERF: ${entry.message}`;
}
}
});
logger.performance.info('Model loading completed', { duration: 2341 });
// Outputs: ⚡ PERF: Model loading completed (2341ms)
🤝 Contributing
We welcome contributions! Please see our Contributing Guide for details.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add some amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.