Package Exports
- @techdebtgpt/archdoc-generator
- @techdebtgpt/archdoc-generator/dist/src/index.js
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (@techdebtgpt/archdoc-generator) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
ποΈ ArchDoc Generator
π€ AI-powered architecture documentation generator Powerful CLI tool for generating comprehensive architecture documentation automatically
ArchDoc Generator is an intelligent tool that analyzes your codebase and generates comprehensive, accurate architectural documentation automatically. It supports any programming language and uses AI-powered agents to understand your project structure, dependencies, patterns, security, and data flows.
π Table of Contents
- Features
- Quick Start
- MCP Integration
- Search Strategy Performance
- CLI Usage
- Programmatic Usage
- Configuration
- What Gets Generated
- Available Agents
- Architecture Highlights
- Supported Languages
- Future Work & Roadmap
- Common Questions
- Contributing
- License
β¨ Features
- π€ 11 Specialized AI Agents: File Structure, Dependencies, Patterns, Flows, Schemas, Security, Error Handling Architecture, Data Contracts, Technical Debt, Repository KPI, and Architecture synthesis.
- π RAG-Powered Queries: Query your architecture docs with natural language using FREE local embeddings (TF-IDF + Graph-based retrieval).
- π Repository Health Dashboard: LLM-powered KPI analysis with actionable insights on code quality, testing, architecture health, and technical debt.
- π RAG Vector Search + Hybrid Retrieval: Semantic similarity search (FREE local TF-IDF or cloud providers) combined with dependency graph analysis - finds files by meaning AND structure. See docs β
- πΎ JSON-First Architecture (v0.3.37+): All agent outputs stored as JSON in
.archdoc/cache/with Markdown as rendered viewsβenable fast local queries, multi-format exports, and zero-LLM-cost lookups. - β‘ Generation Performance Metrics: Track agent execution times, token usage, costs, and confidence scores in metadata.
- π 17 Languages Out-of-the-Box: TypeScript, Python, Java, Go, C#, C/C++, Kotlin, PHP, Ruby, Rust, Scala, Swift, CSS, HTML, JSON, XML, Flex/ActionScript.
- π§ AI-Powered: Uses LangChain with Claude 4.5, OpenAI o1/GPT-4o, Gemini 2.5, or Grok 3.
- π Comprehensive Analysis: Structure, dependencies, patterns, flows, schemas, security, and executive-level KPIs.
- π Markdown Output: Clean, version-controllable documentation with smart navigation and dynamic table of contents.
- π Iterative Refinement: Self-improving analysis with quality checks and gap detection (LangGraph-based workflow).
- π¨ Customizable: Prompt-based agent selection and configuration without code changes.
- π LangSmith Tracing: Full observability of AI workflows with detailed token tracking and multi-step traces.
- π Security Analysis: Vulnerability detection, authentication review, and crypto analysis.
- β Extensible: Add support for any language via configurationβno code changes required.
- π° Delta Analysis (v0.3.37+): Automatic change detection reduces costs by 60-90% on incremental runs. Uses Git or file hashing to only analyze changed files.
- π MCP Integration (v0.3.30+): Native Model Context Protocol support for Cursor, Claude Code, VS Code + Copilot, and Claude Desktopβaccess ArchDoc as a native tool in your AI assistant.
- π Recursive .gitignore Support: Automatically loads
.gitignorepatterns from any directory level (root and subdirectories), ensuring consistent file filtering across monorepo structures and nested projects. - πΏ Branch-Based Augmented Generation (v0.3.37+): Delta analysis supports Git branches, tags, and commitsβcompare against
main,develop, or any specific reference to generate augmented documentation focusing only on changes.
π Quick Start
Installation
# Using npm
npm install -g @techdebtgpt/archdoc-generator
# Using yarn
yarn global add @techdebtgpt/archdoc-generator
# Using pnpm
pnpm add -g @techdebtgpt/archdoc-generatorInteractive Setup (Recommended)
Run the interactive configuration wizard:
archdoc config --initThis will:
- Prompt you to choose an LLM provider (Anthropic/OpenAI/Google).
- Ask for your API key.
- Create
.archdoc.config.jsonwith your configuration. - Validate your setup.
Basic Usage
# Analyze current directory
archdoc analyze
# Analyze specific project
archdoc analyze /path/to/your/project
# Custom output location
archdoc analyze --output ./docs
# Verbose output for debugging
archdoc analyze --verbose
# Branch-based augmented generation (delta analysis)
archdoc analyze --since main # Compare against main branch
archdoc analyze --since develop # Compare against develop branch
archdoc analyze --since v1.0.0 # Compare against specific tag
# Force full analysis (skip delta analysis)
archdoc analyze --force
# Quick analysis (faster, fewer tokens)
archdoc analyze --depth quickFor complete CLI options and advanced usage, see CLI Usage section below.
π Query Generated Documentation (RAG)
Once documentation is generated, query it with natural language using Retrieval-Augmented Generation (RAG):
# Interactive query mode
archdoc query
# Ask a specific question
archdoc query "Which services handle user authentication?"
# Query with file context (find related files)
archdoc query "Show all files related to payment processing"
# Explain a specific file (find its role and dependencies)
archdoc explain src/services/auth.service.ts
# Analyze architecture impact
archdoc impact src/utils/ --show-dependentsKey Features:
- β Zero-Cost Local Search: Uses FREE local TF-IDF embeddings (no API calls)
- β Hybrid Retrieval: Combines semantic search with dependency graph analysis
- β
Smart Caching: Queries reuse
.archdoc/cache/JSON files instead of re-running LLM - β Multi-Format: Search by function name, file path, architecture role, or natural language
Example Outputs:
Q: "Where is the payment logic?"
A:
- src/services/payment/stripe.service.ts (role: Payment Processor)
- src/handlers/checkout.handler.ts (role: API Endpoint)
- src/models/transaction.model.ts (role: Data Model)
Q: "What breaks if we refactor authentication?"
A:
- Affected: src/middleware/auth.ts
- Depends on: 12 endpoints, 3 services, 2 guards
- Risk level: HIGH (critical path)See docs/VECTOR_SEARCH.md for advanced RAG configuration and docs/USER_GUIDE.md for complete command reference.
π MCP Integration
Model Context Protocol (MCP) allows AI assistants to access ArchDoc tools directly. Use ArchDoc in:
- Cursor - AI code editor
- Claude Code - Claude's code tool
- VS Code + GitHub Copilot
- Claude Desktop - Claude's desktop application
Quick Setup
# Configure ArchDoc
archdoc config --init
# Set up for your client (choose one)
archdoc setup-mcp cursor # For Cursor
archdoc setup-mcp claude-code # For Claude Code
archdoc setup-mcp vscode # For VS Code + Copilot
archdoc setup-mcp claude-desktop # For Claude Desktop
# Restart your AI client and start using ArchDoc!Example Uses
"Use archdoc to generate documentation for this project"
"Query the architecture: What authentication system is used?"
"Analyze dependencies and get recommendations"
"Check if this file follows our architecture"π See docs/MCP-SETUP.md for detailed setup instructions and advanced features.
π Vector Search & Embeddings Performance
We benchmarked 6 configurations (including OpenAI embeddings) on a real-world 6,187-file NestJS project. Graph + Local embeddings is the clear winner!
Quick Comparison:
| Configuration | Speed | Cost | Accuracy | Winner? |
|---|---|---|---|---|
| Graph + Local β | 6.1 min β‘ | $0.08 π° | 84.8% π― | YES β |
| Hybrid + Local | 6.4 min | $0.09 | 84.3% | Good |
| Smart + Local | 6.3 min | $0.08 | 84.6% | Good |
| Keyword-only | 7.3 min | $0.09 | 84.6% | Fallback |
| OpenAI β | 11.7 min β οΈ | $0.29 β οΈ | 82.9% β οΈ | NO |
Key Findings:
- β Graph + Local: Fastest, cheapest, most accurate (best overall)
- β OpenAI: 92% slower, 3.4x more expensive, 1.9% less accurate (NOT recommended)
- π Local embeddings (free) outperform OpenAI embeddings (paid) for code analysis
π Complete Analysis: See Search Strategy Benchmark for:
- Per-agent clarity scores (11 agents Γ 6 configurations)
- Why Graph + Local won (structural > semantic for code)
- Why OpenAI underperformed (8192 token limit, context loss, batching overhead)
- Configuration examples for all use cases
- Memory usage and technical deep-dive
Also see: Vector Search Guide - Complete guide to vector search with integrated recommendations
οΏ½ CLI Usage
Available Commands
| Command | Description | Example |
|---|---|---|
archdoc help |
Show comprehensive help | archdoc help |
archdoc analyze |
Generate comprehensive documentation | archdoc analyze /path/to/project |
archdoc analyze --c4 |
Generate C4 architecture model | archdoc analyze --c4 |
archdoc config --init |
Interactive configuration setup | archdoc config --init |
archdoc config --list |
Show current configuration | archdoc config --list |
archdoc export |
Export docs to different formats | archdoc export .arch-docs --format html |
archdoc setup-mcp <client> |
Set up MCP for AI client | archdoc setup-mcp cursor |
π‘ Tip: Run
archdoc helpfor a comprehensive guide with examples, configuration options, and common workflows.
Documentation Generation
# Analyze current directory
archdoc analyze
# Analyze specific project
archdoc analyze /path/to/your/project
# Custom output location
archdoc analyze --output ./docs
# Enhanced analysis with user focus (runs all agents with extra attention to specified topics)
archdoc analyze --prompt "security vulnerabilities and authentication patterns"
archdoc analyze --prompt "database schema design and API architecture"
# Analysis depth modes
archdoc analyze --depth quick # Fast, less detailed (2 iterations, 70% threshold)
archdoc analyze --depth normal # Balanced (5 iterations, 80% threshold) - default
archdoc analyze --depth deep # Thorough, most detailed (10 iterations, 90% threshold)
# Disable iterative refinement for faster results
archdoc analyze --no-refinement
# Verbose output for debugging
archdoc analyze --verbose
# Delta Analysis (Cost Optimization) - Automatically enabled
# Only analyzes changed/new files, saving 60-90% on token costs
archdoc analyze # Automatic delta analysis (default)
archdoc analyze --since main # Compare against main branch
archdoc analyze --since abc123def # Compare against specific commit
archdoc analyze --force # Force full analysis (ignore delta)Delta Analysis (Cost Optimization)
ArchDoc automatically performs delta analysis to reduce costs on incremental runs. Only changed and new files are analyzed, typically saving 60-90% on token costs.
How it works:
- Git projects: Uses Git to detect files changed since the last commit or a specific commit/branch/tag
- Non-Git projects: Uses file hashing to detect changes since the last analysis
- Automatic: Delta analysis is enabled by default - no configuration needed
- Cache integration: Cached results from previous runs are automatically loaded and merged
# Automatic delta analysis (default behavior)
archdoc analyze
# Compare against a specific Git commit/branch/tag
archdoc analyze --since main
archdoc analyze --since abc123def
archdoc analyze --since v1.0.0
# Force full analysis (analyze all files, ignore delta analysis)
archdoc analyze --force
# Delta analysis with focused prompt
archdoc analyze --prompt "security vulnerabilities" --since HEAD~1When to use --force:
- First-time analysis of a project
- When you want to ensure all files are analyzed regardless of changes
- After major refactoring where change detection might miss dependencies
C4 Architecture Model Generation
The C4 orchestrator now supports all advanced features from documentation generation, including:
- π Vector Search: Semantic file retrieval with local/OpenAI/Google embeddings
- π Dependency Graph: Built-in import and module analysis
- π° Cost Tracking: Real-time token and cost monitoring with budget limits
- β‘ LangSmith Tracing: Full observability with custom run names
- π― Agent Skip Logic: Automatically skips agents with no relevant data
# Generate C4 model for current directory
archdoc analyze --c4
# Generate C4 model with vector search (uses config settings)
archdoc analyze --c4
# Generate C4 model for specific project
archdoc analyze /path/to/project --c4
# Custom output location for C4 model
archdoc analyze --c4 --output ./architecture-docs
# C4 model with verbose output and cost limit
archdoc analyze --c4 --verbose --max-cost 1.0
# Quick analysis (1 question per level, fastest)
archdoc analyze --c4 --depth quick
# Deep analysis (4 questions per level, comprehensive)
archdoc analyze --c4 --depth deepNote: Vector search mode is configured in .archdoc.config.json via the searchMode.mode setting. The C4 orchestrator will automatically use your configured search mode (vector or keyword) and embeddings provider.
Configuration Management
# Interactive configuration wizard (recommended for first-time setup)
archdoc config --init
# List current configuration
archdoc config --list
# Get specific configuration value
archdoc config --get llmProvider
archdoc config --get anthropicApiKey
# Set configuration value
archdoc config --set llmProvider=anthropic
archdoc config --set anthropicApiKey=your-api-key
# Reset configuration to defaults
archdoc config --resetExport and Format Options
# Single-file output (default: multi-file)
archdoc analyze --single-file
# Export as JSON
archdoc analyze --single-file --format json
# Export as HTML
archdoc analyze --single-file --format html
# Export as Markdown (default)
archdoc analyze --single-file --format markdown
# Export existing documentation to different formats
archdoc export .arch-docs --format html --output ./docs.html
archdoc export .arch-docs --format json --output ./docs.json
archdoc export .arch-docs --format confluence --output ./confluence.md
# Export with custom template
archdoc export .arch-docs --format html --template ./my-template.html --output ./custom-docs.htmlVector Search & Hybrid Retrieval
# Vector search with local embeddings (FREE, default)
archdoc analyze --search-mode vector
# Keyword search (faster, simpler)
archdoc analyze --search-mode keyword
# Hybrid retrieval (semantic + structural)
archdoc analyze --search-mode vector --retrieval-strategy hybrid
# Configure in .archdoc.config.json for persistence:
{
"searchMode": {
"mode": "vector",
"embeddingsProvider": "local",
"strategy": "hybrid",
"vectorWeight": 0.6,
"graphWeight": 0.4
}
}
# See docs/VECTOR_SEARCH.md for complete documentationWhat Files Are Excluded?
Both File Scanner and Vector Search automatically exclude common build/dependency folders with language-specific patterns:
Language-Specific Exclusions (automatically detected from project languages):
- TypeScript/JavaScript:
node_modules/,dist/,build/,.next/,out/ - Python:
venv/,__pycache__/,.pytest_cache/,dist/,build/ - Java:
target/,build/,.gradle/,.m2/ - Go:
vendor/,bin/ - Rust:
target/ - PHP:
vendor/ - Ruby:
vendor/,tmp/ - C#:
bin/,obj/,packages/ - C/C++:
build/,bin/,obj/ - Kotlin:
build/ - Scala:
target/,out/ - Swift:
build/ - Dart:
build/,.dart_tool/,.packages/ - And more...
Common System Exclusions (applies to all projects):
- Version control:
.git/,.svn/,.hg/ - Test files:
.test.,.spec.,__tests__/,test_,*_test.* - IDE directories:
.idea/,.vscode/,.vs/ - Build caches:
.cache/,.parcel-cache/,.nyc_output/ - Framework-specific:
.next/,.nuxt/,.svelte-kit/,.docusaurus/ - Generated code: Coverage reports, logs, OS files (
.DS_Store,Thumbs.db)
Gitignore Support:
- Recursively loads all
.gitignorefiles at any directory level (root and subdirectories) - Patterns from
.gitignorefiles are used as-is (no automatic modification) - Static patterns (when no
.gitignoreexists) use**/prefix for recursive matching - Works with all languages and monorepo structures
- Default:
respectGitignore: true
Customize Exclusions in .archdoc.config.json:
{
"scan": {
"excludePatterns": [
"**/node_modules/**", // JavaScript/TypeScript
"**/vendor/**", // PHP, Go
"**/target/**", // Java, Rust
"**/venv/**", // Python virtual env
"**/my-custom-folder/**" // Your own exclusions
],
"respectGitignore": true // Honor .gitignore (default: true)
}
}Example: On a 6,187-file NestJS project, vector search processes ~889 source files (14%) - focusing on actual code, not dependencies.
Advanced Usage
# Incremental updates (preserves existing docs, adds new analysis)
archdoc analyze --prompt "new feature area to document"
# (Automatically detects existing docs and runs in incremental mode)
# Full regeneration even if docs exist
archdoc analyze --clean
# Specify LLM provider and model
archdoc analyze --provider anthropic --model claude-sonnet-4-5-20250929
archdoc analyze --provider openai --model gpt-4o
archdoc analyze --provider google --model gemini-2.0-flash-exp
# Budget control (halt if cost exceeds limit)
archdoc analyze --max-cost 10.0 # Stop if cost exceeds $10
# Custom refinement settings
archdoc analyze --refinement-iterations 10 --refinement-threshold 90 --refinement-improvement 15CLI Options Reference
archdoc analyze [path] [options]Options:
| Option | Description | Default |
|---|---|---|
--output <dir> |
Output directory | .arch-docs |
--c4 |
Generate C4 architecture model (Context/Containers/Components) | false |
--prompt <text> |
Enhance analysis with focus area (all agents still run) | |
--depth <level> |
Analysis depth: quick, normal, deep |
normal |
--provider <name> |
LLM provider: anthropic, openai, xai, google |
|
--model <name> |
Specific model to use | |
--refinement |
Enable iterative refinement | true |
--refinement-iterations <n> |
Max refinement iterations | 5 |
--refinement-threshold <n> |
Clarity threshold % | 80 |
--force |
Force full analysis (ignore delta analysis) | false |
--since <commit> |
Git commit/branch/tag for delta analysis (Git projects only) | |
--no-clean |
Don't clear output directory | |
--verbose |
Show detailed progress |
C4 Model Generation
Generate structured C4 architecture diagrams with PlantUML output:
# Generate C4 model
archdoc analyze --c4
# Generate for specific project
archdoc analyze /path/to/project --c4 --output ./architecture
# Output includes:
# - c4-model.json (structured data)
# - context.puml (system context diagram)
# - containers.puml (container diagram)
# - components.puml (component diagram)π§ Programmatic Usage
Use the library in your Node.js applications:
Standard Documentation
import {
DocumentationOrchestrator,
AgentRegistry,
FileSystemScanner,
} from '@techdebtgpt/archdoc-generator';
// Setup registry with agents
const registry = new AgentRegistry();
const scanner = new FileSystemScanner();
const orchestrator = new DocumentationOrchestrator(registry, scanner);
// Generate documentation
const docs = await orchestrator.generateDocumentation('/path/to/project', {
maxTokens: 100000,
parallel: true,
iterativeRefinement: {
enabled: true,
maxIterations: 5,
clarityThreshold: 80,
},
});
console.log('Generated:', docs.summary);C4 Architecture Model
import {
C4ModelOrchestrator,
AgentRegistry,
FileSystemScanner,
} from '@techdebtgpt/archdoc-generator';
// Setup registry with agents
const registry = new AgentRegistry();
const scanner = new FileSystemScanner();
const orchestrator = new C4ModelOrchestrator(registry, scanner);
// Generate C4 model
const result = await orchestrator.generateC4Model('/path/to/project');
console.log('C4 Context:', result.c4Model.context);
console.log('Containers:', result.c4Model.containers);
console.log('Components:', result.c4Model.components);
// PlantUML diagrams available in result.plantUMLModelSee the API Reference for complete programmatic documentation.
βοΈ Configuration
Environment Variables
| Variable | Description |
|---|---|
ANTHROPIC_API_KEY |
Anthropic Claude API key |
OPENAI_API_KEY |
OpenAI GPT API key |
GOOGLE_API_KEY |
Google Gemini API key |
XAI_API_KEY |
xAI Grok API key |
DEFAULT_LLM_PROVIDER |
Default provider (e.g., anthropic) |
DEFAULT_LLM_MODEL |
Default model (e.g., claude-sonnet-4-5-20250929) |
LANGCHAIN_TRACING_V2 |
Enable LangSmith tracing (true) |
LANGCHAIN_API_KEY |
LangSmith API key |
LANGCHAIN_PROJECT |
LangSmith project name |
See the Configuration Guide for detailed options.
π¨ What Gets Generated
Standard Documentation
The tool generates a multi-file documentation structure:
.arch-docs/
βββ index.md # Table of contents with smart navigation
βββ architecture.md # High-level system design
βββ file-structure.md # Project organization
βββ dependencies.md # External & internal deps
βββ patterns.md # Design patterns detected
βββ code-quality.md # Quality metrics (if data exists)
βββ flows.md # Data & control flows
βββ schemas.md # Data models
βββ security.md # Security vulnerability analysis
βββ error-handling.md # Error propagation and resilience architecture
βββ data-contracts.md # DTO/entity/model boundaries and mapping patterns
βββ technical-debt.md # Debt hotspots and cleanup priorities
βββ recommendations.md # Improvement suggestions
βββ kpi.md # Repository health KPI dashboard (NEW!)
βββ metadata.md # Generation metadata + performance metrics
βββ changelog.md # Documentation update historyWhat's New:
kpi.md: LLM-generated repository health dashboard with actionable insights on code quality, testing coverage, architecture health, dependency management, and technical debt.error-handling.md: Error boundary strategy, exception translation, resilience patterns, and risk analysis.data-contracts.md: DTO/entity/model structure quality, validation strategy, and mapper consistency checks.technical-debt.md: Debt scoring, hotspot files, quick wins, and strategic cleanup initiatives.- Generation Performance Metrics: Added to
metadata.mdshowing agent confidence scores, execution times, token efficiency, and cost breakdown.
C4 Architecture Model
When using --c4, generates structured architecture diagrams:
.arch-docs-c4/
βββ c4-model.json # Complete C4 model (JSON)
βββ context.puml # System Context (Level 1)
βββ containers.puml # Container Diagram (Level 2)
βββ components.puml # Component Diagram (Level 3)C4 Model Levels:
- Context: Shows the system boundary, actors (users), and external systems
- Containers: Shows deployable units (APIs, web apps, databases, microservices)
- Components: Shows internal modules and their relationships within containers
π€ Available Agents
Each agent specializes in a specific analysis task using LLM-powered intelligence:
| Agent | Purpose | Priority | Output File | Notes |
|---|---|---|---|---|
| File Structure | Project organization, entry points | HIGH | file-structure.md |
Always runs |
| Dependency Analyzer | External deps, internal imports | HIGH | dependencies.md |
Always runs |
| Architecture Analyzer | High-level design, components | HIGH | architecture.md |
Always runs |
| Pattern Detector | Design patterns, anti-patterns | MEDIUM | patterns.md |
Always runs |
| Flow Visualization | Control & data flows with diagrams | MEDIUM | flows.md |
Always runs |
| Schema Generator | Data models, interfaces, type definitions | MEDIUM | schemas.md |
Only if schemas detected β οΈ |
| Security Analyzer | Vulnerabilities, auth, secrets, crypto | MEDIUM | security.md |
Always runs |
| Error Handling Architecture | Error boundaries, translation, resilience | MEDIUM | error-handling.md |
Always runs |
| Data Contracts | DTO/entity/model boundaries and mapping | MEDIUM | data-contracts.md |
Always runs |
| Technical Debt | Debt hotspots, maintainability, priorities | MEDIUM | technical-debt.md |
Always runs |
| KPI Analyzer | Repository health, executive KPI dashboard | MEDIUM-HIGH | kpi.md |
Always runs |
β οΈ Schema Generator Smart Behavior:
The Schema Generator agent is intelligent - it only generates output when it detects actual schema files:
Detects:
- β
Database: Prisma schemas (
.prisma), TypeORM entities (@Entity), Sequelize models - β
API: DTOs (
.dto.ts), OpenAPI/Swagger definitions - β
GraphQL: Type definitions (
.graphql,.gql) - β Types: TypeScript interfaces, type definitions (focused schema files only)
Behavior:
- If NO schemas found: Generates
schemas.mdwith "No schema definitions found" message - If schemas found: Generates comprehensive documentation with Mermaid ER/class diagrams
- Uses
__FORCE_STOP__to avoid unnecessary LLM calls when no schemas exist
Why "No schemas"?
- Project may use embedded types in service/controller files (not dedicated schema files)
- Database-less projects (e.g., static site generators, CLI tools)
- API-only projects using inline interfaces
This is not a failure - it's smart detection saving you tokens and cost! π°
KPI Analyzer Features:
- π Overall repository health score (0-100%)
- π― Component scores: Code quality, testing, architecture, dependencies, complexity
- π― Component scores: Code quality, testing, architecture, dependencies, complexity, error handling, data contracts, and technical debt
- π Detailed metrics with ASCII visualizations
- π‘ 8+ actionable insights with prioritized action items
- π Executive-friendly language with quantifiable targets
ποΈ Architecture Highlights
Multi-Agent System
The orchestrator coordinates agents to perform analysis.
βββββββββββββββββββββββββββββββ
β Documentation Orchestrator β
βββββββββββββββ¬ββββββββββββββ
β
βββββββββββ΄ββββββββββ
β Agent Registry β
βββββββββββ¬ββββββββββ
β
βββββΌβββββ βββββΌββββ βββββΌββββ
β Agent 1β β Agent 2β β Agent Nβ
ββββββββββ βββββββββ βββββββββSelf-Refining Analysis
Each agent autonomously improves its analysis through iterative refinement. It evaluates its own output, identifies gaps, searches for relevant code, and refines until quality thresholds are met.
Learn how the self-refinement workflow works β
LangChain LCEL Integration
All agents use LangChain Expression Language (LCEL) for composable AI workflows with unified LangSmith tracing.
π Language Support
ArchDoc Generator supports 17 programming and markup languages out-of-the-box with zero configuration:
Programming Languages
| Language | Extensions | Import Detection | Framework Support |
|---|---|---|---|
| TypeScript/JavaScript | .ts, .tsx, .js, .jsx, .mjs, .cjs |
ES6 imports, CommonJS require | NestJS, Express, React, Angular, Vue, Next.js |
| Python | .py, .pyi, .pyx |
from...import, import |
Django, Flask, FastAPI, Pyramid |
| Java | .java |
import statements |
Spring Boot, Quarkus, Micronaut |
| Go | .go |
import blocks |
Gin, Echo, Fiber, Chi |
| C# | .cs, .csx |
using statements |
ASP.NET, Entity Framework |
| C/C++ | .c, .cpp, .cc, .cxx, .h, .hpp, .hh |
#include directives |
Linux, POSIX |
| Kotlin | .kt, .kts |
import statements |
Spring, Ktor, Micronaut |
| PHP | .php |
use, require |
Laravel, Symfony |
| Ruby | .rb, .rake |
require statements |
Rails, Sinatra |
| Rust | .rs |
use statements |
Tokio, Actix, Rocket |
| Scala | .scala |
import statements |
Akka, Play |
| Swift | .swift |
import statements |
SwiftUI, Vapor |
Web & Data Languages
| Language | Extensions | Detection | Notes |
|---|---|---|---|
| CSS | .css, .scss, .sass |
@import rules |
Theme and variable detection |
| HTML | .html, .htm |
src, href attributes |
Script/link/image extraction |
| JSON | .json |
N/A | Configuration file analysis |
| XML | .xml |
xi:include elements |
XInclude support |
| Flex/ActionScript | .as, .mxml |
import statements |
Flash/Flex project support |
Multi-Language Projects
The scanner automatically detects all supported languages in your project:
# Just run the command - no configuration needed!
archdoc analyze ./my-project
# Example output:
# β
Found 487 imports across 17 file types
# - TypeScript: 234 imports
# - Python: 123 imports
# - Rust: 89 imports
# - CSS: 41 importsCustom Language Support
Need support for a language not listed? No code changes required!
Add custom language configurations via .archdoc.config.json:
{
"languages": {
"custom": {
"myLanguage": {
"displayName": "My Language",
"filePatterns": {
"extensions": [".mylang"]
},
"importPatterns": {
"myImport": "^import\\s+([^;]+);"
}
}
}
}
}See Custom Language Configuration Guide for complete documentation on:
- Adding new languages
- Extending built-in language configurations
- Custom import pattern syntax
- Language-specific frameworks and keywords
π Future Work & Roadmap
We're building breakthrough features to transform how teams manage architecture documentation. See our detailed roadmap β for comprehensive plans.
π― Upcoming Features
We've organized our future work into five foundational EPICS:
- ποΈ Core MCP Integration: Native support for Cursor, Claude Code, and VS Code.
- ποΈ Token & Cost Optimization: JSON-first internal format and delta analysis.
- ποΈ Developer-Centric Query Interface: Natural language queries and impact analysis.
- ποΈ Observability & CI Guardrails: Drift detection and architecture scorecards.
- ποΈ Extensibility & Ecosystem: Custom agent API and Architecture-as-Code.
β View Full Roadmap & Technical Details
π€ Contributing
We welcome contributions! See the Contributing Guide for details on:
- Development setup
- Creating custom agents
- Testing guidelines
- Code style and standards
- Pull request process
Community Guidelines
- Code of Conduct - Our pledge to foster an open and welcoming environment
- Security Policy - How to report security vulnerabilities responsibly
- Issue Templates - Bug reports, feature requests, and more
- Pull Request Template - Guidelines for submitting changes
οΏ½ Resources
- π Website: techdebtgpt.com
- π¦ GitHub: github.com/techdebtgpt/architecture-doc-generator
- π Documentation: Full Documentation
- π¬ Discussions: GitHub Discussions
- π Issues: Report Issues
β Common Questions
Q: Why does Schema Generator say "No schema definitions found"?
A: This is not a failure - it's smart detection! The Schema Generator only generates output when it detects dedicated schema files:
What it detects:
- β
Prisma:
schema.prisma,*.prisma - β
TypeORM:
@Entity(),*.entity.ts - β
DTOs:
*.dto.ts, API schemas - β
GraphQL:
*.graphql,*.gql - β
OpenAPI:
swagger.json,openapi.yaml
Common causes of "No schemas":
Analyzing subdirectory only - Schema files in
prisma/won't be found if you run onsrc/only- β
archdoc analyze ./src(misses./prisma/schema.prisma) - β
archdoc analyze .(includes all directories)
- β
Embedded types - Types in service/controller files (not dedicated schema files)
Database-less projects - Static sites, CLI tools, frontend-only apps
Inline interfaces - TypeScript interfaces mixed with business logic
Solution: Run analysis from project root, not subdirectories.
Q: What files are excluded from vector search?
A: Vector search automatically excludes:
- Dependencies:
node_modules/,vendor/,target/ - Build outputs:
dist/,build/,out/,bin/,obj/ - Test files:
.test.,.spec.,__tests__/,test_ - Git:
.git/(and respects.gitignoreby default)
From 6,187 total files, only ~889 source files (14%) are indexed for optimal performance.
Q: Which search strategy should I use?
A: For production, use Hybrid (default):
- Combines semantic similarity (60%) + dependency graph (40%)
- Best balance of quality and performance
- Only 7% slower than vector-only, but 28% better architectural insights
For fast iteration, use Vector-only or Smart.
Q: How much does it cost?
A: Using local embeddings (FREE) with Claude Haiku:
- Small project (1K files): ~$0.10-0.20
- Medium project (5K files): ~$0.35-0.45
- Large project (10K+ files): ~$0.60-0.80
Tip: Use --depth quick to reduce cost by ~30%.
Q: Can I use it on private/closed-source code?
A: Yes! Your code is only sent to the LLM provider (Anthropic/OpenAI/Google) and is not stored or shared. Use local embeddings (embeddingsProvider: "local") for completely offline semantic search.
Q: How do I add support for my custom language?
A: No code changes needed! Add to .archdoc.config.json:
{
"languages": {
"custom": {
"myLanguage": {
"displayName": "My Language",
"filePatterns": {
"extensions": [".mylang"]
},
"importPatterns": {
"myImport": "^import\\s+([^;]+);"
}
}
}
}
}See Custom Language Guide for details.
π License
Apache License 2.0 - see the LICENSE file for details.
Made with β€οΈ by TechDebtGPT