JSPM

  • Created
  • Published
  • Downloads 7242
  • Score
    100M100P100Q108985F
  • License MIT

Semantic code search for Magento 2 — index, search, MCP server

Package Exports

  • magector
  • magector/src/mcp-server.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (magector) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

Magector

Technology-aware MCP server for Magento 2 and Adobe Commerce with intelligent indexing and search.

Magector is a Model Context Protocol (MCP) server that deeply understands Magento 2 and Adobe Commerce. It builds a semantic vector index of your entire codebase — 18,000+ files across hundreds of modules — and exposes 47 tools that let AI assistants search, navigate, and understand the code with domain-specific intelligence. Instead of grepping for keywords, your AI asks "how are checkout totals calculated?" and gets ranked, relevant results in under 50ms, enriched with Magento pattern detection (plugins, observers, controllers, DI preferences, layout XML, and 20+ more).

Rust Node.js Magento Adobe Commerce Accuracy License: MIT


Why Magector

Magento 2 and Adobe Commerce have 18,000+ PHP, XML, JS, PHTML, and GraphQL files spread across hundreds of modules. The codebase relies heavily on indirection — plugins intercept methods defined in other modules, observers react to events dispatched elsewhere, di.xml rewires interfaces to concrete classes, and layout XML stitches blocks and templates together. No single file tells the full story.

Generic search tools — grep, IDE search, or the keyword matching built into AI assistants — can't bridge this gap. They find literal strings but can't connect "how does checkout calculate totals?" to TotalsCollector.php when the word "totals" appears in hundreds of unrelated files.

Magector solves this with three layers of intelligence:

  1. Semantic vector index — every file is embedded into a 384-dimensional space (ONNX, all-MiniLM-L6-v2) where meaning matters more than keywords. A search for "payment capture" returns CaptureOperation.php because the embeddings are close, not because the file contains the word "capture".

  2. Magento technology awareness — 20+ pattern detectors identify plugins, observers, controllers, blocks, cron jobs, GraphQL resolvers, DI preferences, layout XML, and more. Every search result is enriched with what kind of Magento component it is, so the AI client understands the code's role in the system.

  3. Adaptive learning (SONA) — Magector tracks which results you actually use and adjusts future rankings with MicroLoRA feedback, getting smarter over time without any API calls.

The result: your AI assistant calls one MCP tool and gets ranked, pattern-enriched results in 10-45ms — instead of burning tokens grepping through dozens of wrong files. High relevance accuracy means the AI reads fewer, more targeted files, which optimizes context window usage, reduces API costs, and accelerates development cycles.

Approach Semantic matches Magento-aware Speed (18K files)
grep / ripgrep No No 100-500ms
IDE search No No 200-1000ms
GitHub search Partial No 500-2000ms
Magector Yes Yes 10-45ms

Features

  • Semantic search -- find code by meaning, not exact keywords
  • 99.2% accuracy -- validated with 101 E2E test queries across 16 tool categories, plus 557 Rust-level test cases
  • Hybrid search -- combines semantic vector similarity with keyword re-ranking for best-of-both-worlds results
  • Structured JSON output -- results include file path, class name, methods list, role badges, and content snippets for minimal round-trips
  • Persistent serve mode -- keeps ONNX model and HNSW index resident in memory, eliminating cold-start latency
  • Incremental re-indexing -- background file watcher detects changes and updates the index without restart (tombstone + compact strategy)
  • ONNX embeddings -- native 384-dim transformer embeddings via ONNX Runtime
  • 36K+ vectors -- indexes the complete Magento 2 / Adobe Commerce codebase including framework internals
  • Magento-aware -- understands controllers, plugins, observers, blocks, resolvers, repositories, and 20+ Magento patterns
  • Adobe Commerce compatible -- works with both Magento Open Source and Adobe Commerce (B2B, Staging, and all Commerce-specific modules)
  • AST-powered -- tree-sitter parsing for PHP and JavaScript extracts classes, methods, namespaces, and inheritance
  • Cross-tool discovery -- tool descriptions include keywords and "See also" references so AI clients find the right tool on the first try
  • SONA feedback learning -- self-adjusting search that learns from MCP tool call patterns (e.g., search → find_plugin refines future rankings for similar queries)
  • SONA v2 with MicroLoRA + EWC++ -- rank-2 low-rank adapter (1536 params, ~6KB) adjusts query embeddings based on learned patterns; Elastic Weight Consolidation prevents catastrophic forgetting during online learning
  • Diff analysis -- risk scoring and change classification for git commits and staged changes
  • Complexity analysis -- cyclomatic complexity, function count, and hotspot detection across modules
  • Fast -- 10-45ms queries via persistent serve process, batched ONNX embedding with adaptive thread scaling
  • LLM description enrichment -- generate natural-language descriptions of di.xml files using Claude, stored in SQLite, and prepend them to embedding text so descriptions influence vector search ranking (not just post-retrieval display)
  • MCP server -- 47 tools integrating with Claude Code, Cursor, and any MCP-compatible AI tool
  • Clean architecture -- Rust core handles all indexing/search, Node.js MCP server delegates to it

Architecture

flowchart LR
  subgraph node ["Node.js Layer"]
    direction TB
    G["CLI<br/>init · index · search · describe"]
    E["MCP Server<br/>47 tools · LRU cache"]
    F["Persistent Serve Process"]
    G --> F
    E --> F
  end

  F -->|"stdin/stdout JSON"| rust

  subgraph rust ["Rust Core"]
    direction TB
    A["AST Parser<br/>PHP · JS · XML"]
    B["Pattern Detection<br/>20+ Magento patterns"]
    B2["Description Enrichment<br/>LLM-powered di.xml summaries"]
    C["ONNX Embedder<br/>all-MiniLM-L6-v2 · 384d"]
    D["HNSW Vector Search<br/>hybrid reranking · SONA"]
    A --> B --> B2 --> C --> D
  end

  style rust fill:#f4a460,color:#000
  style node fill:#68b684,color:#000

Indexing Pipeline

flowchart LR
  A["Source File"] --> B["AST Parser"]
  B --> C["Pattern Detection"]
  C --> D["Text Enrichment"]
  D --> D2{"Descriptions DB?"}
  D2 -->|Yes| D3["Prepend LLM Description"]
  D2 -->|No| E["ONNX Embedding"]
  D3 --> E
  E --> F[("HNSW Index")]
  A --> G["Metadata"] --> F

Search Pipeline

flowchart LR
  Q["Query"] --> E1["Synonym Enrichment"]
  E1 --> E2["ONNX Embedding"]
  E2 --> H["HNSW Search"]
  H --> R["Hybrid Reranking"]
  R --> SA["SONA Adjustment"]
  SA --> J["Structured JSON"]

Components

Component Technology Purpose
Embeddings ort (ONNX Runtime) all-MiniLM-L6-v2, 384 dimensions
Vector search hnsw_rs + hybrid reranking Approximate nearest neighbor + keyword boosting
PHP parsing tree-sitter-php Class, method, namespace extraction
JS parsing tree-sitter-javascript AMD/ES6 module detection
Pattern detection Custom Rust 20+ Magento-specific patterns
CLI clap Command-line interface (index, search, serve, validate)
Unified metadata rusqlite (bundled SQLite) LLM descriptions, method-chain enrichment, process state, cache — all in .magector/data.db
SONA Custom Rust Feedback learning with MicroLoRA + EWC++
MCP server @modelcontextprotocol/sdk AI tool integration with structured JSON output
Config data JSON exports in .magector/config-data/ One-time core_config_data exports per environment for config tracing

Security

Magector operates on source code indexed from potentially-untrusted vendor/ dependencies and is driven by an LLM that may be manipulated via prompt injection in indexed comments, docblocks, or markdown. The following hardening applies as of v2.15.1:

Path traversal protection

All tools that accept a path argument (magento_read, magento_grep, magento_ast_search, magento_find_dataobject_issues) route the input through safePath() / safeRelPath() helpers in src/mcp-server.js. These:

  1. Resolve the argument against MAGENTO_ROOT with path.resolve() (normalizes .., symlinks are not followed during validation).
  2. Reject any resolved path that does not lie inside MAGENTO_ROOT.

This prevents a hostile vendor/ comment from instructing the LLM to e.g. magento_read ../../home/user/.ssh/id_rsa. Both the standalone case handlers and their magento_batch counterparts share the same chokepoint.

Shell injection hardening in auto-update

src/update.js fetches the latest field from the npm registry and re-execs itself with the new version string. Previously this was interpolated into a shell command; a tampered registry response could inject shell metacharacters. As of v2.15.1:

  • The re-exec passes argv as an array to a no-shell spawner (no intermediate shell).
  • A semver-strict isSafeVersion() validator rejects any version string containing metacharacters or that does not match X.Y.Z / X.Y.Z-prerelease form.
  • Fails closed: the auto-update is silently skipped rather than run a malformed version.

Unix socket permissions

The serve-proxy Unix socket at .magector/serve.sock is created with chmod 0600 immediately after listen(). On multi-user systems, another local account can no longer connect and query the vector index (which would leak indexed source snippets). The chmod is best-effort on platforms that don't support it (logged to .magector/magector.log).

Reporting vulnerabilities

If you find a security issue, please open an issue on the GitHub repo and mark it as security-related. Do not post reproducers that leak actual source contents from private codebases.


Quick Start

Prerequisites

1. Initialize in Your Project

cd /path/to/your/magento2  # or Adobe Commerce project
npx magector init

This single command handles the entire setup:

flowchart LR
  A["npx magector init"] --> B["Verify<br/>Project"]
  B --> C["Download<br/>ONNX Model"]
  C --> D["Index<br/>Codebase"]
  D --> E["Detect IDE<br/>Cursor · Claude Code"]
  E --> E2["API Key<br/>(optional)"]
  E2 --> F["Write MCP<br/>Config"]
  F --> G["Update<br/>.gitignore"]
npx magector search "product price calculation"
npx magector search "checkout totals collector" -l 20

3. Re-index After Changes

npx magector index

4. IDE Setup Only (Skip Indexing)

npx magector setup

CLI Reference

Rust Core CLI

magector-core <COMMAND>

Commands:
  index       Index a Magento codebase
  search      Search the index semantically
  serve       Start persistent server mode (stdin/stdout JSON protocol)
  describe    Generate LLM descriptions for di.xml files (requires ANTHROPIC_API_KEY)
  validate    Run validation suite (downloads Magento if needed)
  download    Download Magento 2 Open Source
  stats       Show index statistics
  embed       Generate embedding for text

index

magector-core index [OPTIONS]

Options:
  -m, --magento-root <PATH>          Path to Magento root directory
  -d, --database <PATH>              Index database path [default: ./.magector/index.db]
  -c, --model-cache <PATH>           Model cache directory [default: ./models]
      --descriptions-db <PATH>       Path to descriptions SQLite DB (descriptions are prepended to embeddings)
  -v, --verbose                      Enable verbose output

When --descriptions-db is provided (or auto-detected as data.db next to the index), descriptions are prepended to the embedding text as "Description: {text}\n\n" before the raw file content. This places semantic terms within the 256-token ONNX window, significantly improving retrieval of di.xml files for natural-language queries.

magector-core search <QUERY> [OPTIONS]

Options:
  -d, --database <PATH>   Index database path [default: ./.magector/index.db]
  -l, --limit <N>         Number of results [default: 10]
  -f, --format <FORMAT>   Output format: text, json [default: text]

describe

magector-core describe [OPTIONS]

Options:
  -m, --magento-root <PATH>   Path to Magento root directory
  -o, --output <PATH>         Output SQLite database [default: ./.magector/data.db]
      --force                 Re-describe all files (ignore cache)

Generates natural-language descriptions of di.xml files using the Anthropic API (Claude Sonnet). Requires ANTHROPIC_API_KEY environment variable. Descriptions are stored in a SQLite database and used during indexing to enrich embeddings. Only files with changed content hashes are re-described (incremental by default).

serve

magector-core serve [OPTIONS]

Options:
  -d, --database <PATH>            Index database path [default: ./.magector/index.db]
  -c, --model-cache <PATH>         Model cache directory [default: ./models]
  -m, --magento-root <PATH>        Magento root (enables file watcher)
      --descriptions-db <PATH>     Path to descriptions SQLite DB
      --watch-interval <SECS>      File watcher poll interval [default: 60]

Starts a persistent process that reads JSON queries from stdin and writes JSON responses to stdout. Keeps the ONNX model and HNSW index resident in memory for fast repeated queries.

When --magento-root is provided, a background file watcher polls for changed files every --watch-interval seconds and incrementally re-indexes them without restart. Modified and deleted files are soft-deleted (tombstoned) in the HNSW index; new vectors are appended. When tombstoned entries exceed 20% of total vectors, the index is automatically compacted by rebuilding the HNSW graph.

Protocol (one JSON object per line):

// Request:
{"command":"search","query":"product price","limit":10}

// Response:
{"ok":true,"data":[{"id":123,"score":0.85,"metadata":{...}}]}

// Stats request:
{"command":"stats"}

// Watcher status:
{"command":"watcher_status"}
// Response:
{"ok":true,"data":{"running":true,"tracked_files":18234,"last_scan_changes":3,"interval_secs":60}}

// Descriptions (all LLM descriptions from SQLite DB):
{"command":"descriptions"}
// Response:
{"ok":true,"data":{"app/code/Magento/Catalog/etc/di.xml":{"hash":"...","description":"...","model":"claude-sonnet-4-5-20250929","timestamp":1769875137},...}}

// Describe (generate descriptions + auto-reindex affected files):
{"command":"describe"}
// Response:
{"ok":true,"data":{"files_found":371,"described":5,"skipped":366,"errors":0,"described_paths":["app/code/..."]}}

// SONA feedback:
{"command":"feedback","signals":[{"type":"refinement_to_plugin","query":"checkout totals","timestamp":1700000000000}]}
// Response:
{"ok":true,"data":{"learned":1}}

// SONA status:
{"command":"sona_status"}
// Response:
{"ok":true,"data":{"learned_patterns":5,"total_observations":12}}

Node.js CLI

npx magector init [path]        # Full setup: index + IDE config
npx magector index [path]       # Index (or re-index) Magento codebase
npx magector search <query>     # Search indexed code
npx magector describe [path]    # Generate LLM descriptions for di.xml files
npx magector stats              # Show indexer statistics
npx magector setup [path]       # IDE setup only (no indexing)
npx magector mcp                # Start MCP server
npx magector help               # Show help

The describe command and magento_describe MCP tool require an Anthropic API key. During npx magector init, you are prompted to paste your key (optional). If provided, it is stored in the MCP config file as the ANTHROPIC_API_KEY environment variable so the MCP server can use it automatically. You can also set it manually later by adding "ANTHROPIC_API_KEY": "sk-..." to the env section in .mcp.json or ~/.cursor/mcp.json.

Environment Variables

Variable Description Default
MAGENTO_ROOT Path to Magento installation Current directory
MAGECTOR_DB Path to index database ./.magector/index.db
MAGECTOR_BIN Path to magector-core binary Auto-detected
MAGECTOR_MODELS Path to ONNX model directory ~/.magector/models/
MAGECTOR_INDEX_TIMEOUT Indexing wall-clock timeout in milliseconds. Override for very large codebases or CPU-constrained environments. 14400000 (4 h)
MAGECTOR_THREADS Max ONNX intra-op + rayon parsing threads. Equivalent to the --threads CLI flag. Half of CPU cores
OMP_NUM_THREADS Fallback thread limit if MAGECTOR_THREADS is not set (de facto standard for ONNX/OpenMP).
MAGECTOR_BATCH_SIZE Embedding batch size (higher = faster, more RAM). Equivalent to --batch-size. 256
ANTHROPIC_API_KEY API key for description generation (describe command)

Constraining CPU usage during indexing

Indexing a large enterprise codebase (~80K files) can saturate CPU during PHASE 2 (ONNX embedding generation). To keep a developer machine responsive while indexing, lower the thread count:

npx magector index --threads 2                  # use only 2 cores for both parsing and embedding
MAGECTOR_THREADS=2 npx magector index           # equivalent via env var
OMP_NUM_THREADS=2 npx magector index            # also honored as a fallback

The --threads flag and MAGECTOR_THREADS / OMP_NUM_THREADS env vars constrain both the rayon thread pool used by PHASE 1 (parallel AST parsing) and the ONNX intra-op thread pool used by PHASE 2 (embedding inference). The active thread source is logged at startup so you can verify it took effect:

INFO Rayon global pool: 2 threads (available: 16)
INFO ONNX intra_threads: 2 (available: 16, source: --threads flag)

For very large or CPU-constrained runs, you may also need to extend the wall-clock timeout (default 4 hours):

MAGECTOR_INDEX_TIMEOUT=28800000 npx magector index --threads 2   # 8 h timeout, 2 threads

Resume after timeout or interrupt

Indexing writes a crash-safe checkpoint to disk every 50 batches (~12,800 files). If the process is killed or times out mid-run, just re-run npx magector index — it auto-resumes from the last checkpoint:

npx magector index
# ♻️  Resuming from previous run: 38400 vectors across 12200 files already indexed
# ✓ Found 79771 total files; 12200 already indexed, 67571 remaining to process

The indexer collects already-embedded file paths from the existing DB, filters them out of file discovery, preserves the existing HNSW state, and only parses/embeds the files that aren't in the DB yet. Partial resume also picks up new files added to the tree since the previous run.

To force a full rebuild (e.g. after a schema change or if you want to discard stale vectors), pass --force:

npx magector index --force

MCP Server Tools

The MCP server exposes 47 tools for AI-assisted Magento 2 and Adobe Commerce development. All search tools return structured JSON with file paths, class names, methods, role badges, and content snippets -- enabling AI clients to parse results programmatically and minimize file-read round-trips.

Output Format

All search tools return structured JSON:

{
  "results": [
    {
      "rank": 1,
      "score": 0.892,
      "path": "vendor/magento/module-catalog/Model/ProductRepository.php",
      "module": "Magento_Catalog",
      "className": "ProductRepository",
      "namespace": "Magento\\Catalog\\Model",
      "methods": ["save", "getById", "getList", "delete", "deleteById"],
      "magentoType": "repository",
      "fileType": "php",
      "badges": ["repository"],
      "snippet": "class ProductRepository implements ProductRepositoryInterface..."
    }
  ],
  "count": 1
}

Key fields:

  • methods -- list of method names in the class (avoids needing to read the file)
  • badges -- role indicators: plugin, controller, observer, repository, graphql-resolver, model, block
  • snippet -- first 300 characters of indexed content for quick assessment

Search Tools

Tool Description
magento_search Semantic search -- find any PHP class, method, XML config, template, or GraphQL schema by natural language
magento_find_class Find PHP class, interface, abstract class, or trait by name
magento_find_method Find method implementations across the codebase

Magento-Specific Finders

Tool Description
magento_find_config Find XML configuration (di.xml, events.xml, routes.xml, system.xml, webapi.xml, module.xml, layout)
magento_find_template Find PHTML template files for frontend or admin rendering
magento_find_plugin Find interceptor plugins (before/after/around methods) and di.xml declarations. Resolves plugin PHP files and extracts interceptor method signatures (v2.5)
magento_find_fieldset Find fieldset.xml definitions controlling data copy between entities (order→quote, quote→order). Shows fields per aspect (to_order, to_edit) (v2.5)
magento_find_observer Find event observers and events.xml declarations
magento_find_preference Find DI preference overrides -- which class implements an interface
magento_find_controller Find MVC controllers by frontend or admin route path
magento_find_block Find Block classes for view rendering
magento_find_graphql Find GraphQL schema definitions, resolvers, types, queries, and mutations
magento_find_api Find REST/SOAP API endpoints in webapi.xml
magento_find_cron Find cron job definitions in crontab.xml
magento_find_db_schema Find database table definitions in db_schema.xml (declarative schema)

Flow & Dependency Tracing

Tool Description
magento_trace_flow Trace execution flow from an entry point (route, API, GraphQL, event, cron) -- maps controller → plugins → observers → templates with code snippets (v2.5)
magento_trace_shipping_chain Trace the complete shipping rate chain: carriers → collectRates plugins → rate modifiers → totals collectors → fieldset mappings (v2.5)
magento_trace_dependency Trace DI graph for a class/interface -- preferences, plugins, virtualTypes, argument overrides (parses all di.xml, no index needed)
magento_find_event_flow Trace complete event chain: dispatchers → observers → handler PHP classes (parses events.xml + vector search)
magento_find_event_dispatchers Find all PHP locations where a specific event is dispatched -- exact grep matching with method context and surrounding code (v2.3)
magento_find_layout Find layout XML files by handle or content -- lists blocks, containers, and referenceBlock declarations
magento_trace_data_flow Trace how a data attribute flows: find all setters (magic setter, setData, addData) and getters (magic getter, getData) across PHP and XML (v2.3)
magento_trace_call_chain Trace internal method call chain: follows $this->method(), $this->dep->method(), and dispatch() calls to build an execution tree (v2.2)

Auto-detects entry type from pattern (/V1/... → API, snake_case → event, camelCase → GraphQL, path/segments → route), or override with entryType. Use depth: "shallow" (entry + config + plugins) or depth: "deep" (adds observers, layout, templates, DI preferences).

Impact & Testing

Tool Description
magento_impact_analysis Analyze impact of changing a class -- finds use statements, DI references, direct instantiations, and type hints across the codebase
magento_find_test Find PHPUnit tests for a given class/method -- searches Test/ directories for coverage, mocks, and assertions
magento_find_implementors Find all classes implementing a given PHP interface -- scans implements keywords and di.xml <preference> declarations (v2.2)
magento_find_callers Find all call sites of a method across PHP and XML files -- ->method() and ::method() calls (v2.2)
magento_find_di_wiring Complete DI picture for a class: preferences, plugins, constructor args, virtual types, and argument overrides from di.xml (v2.2)

Diagnostics

Tool Description
magento_error_parser Parse Magento error messages and map to root cause, affected files, and fix suggestions (10 known patterns)
magento_performance_profile Profile a Magento subsystem (checkout_totals, order_place, product_save, etc.) for performance bottlenecks -- plugins, observers, and complexity hotspots

Analysis Tools

Tool Description
magento_analyze_diff Analyze git diffs for risk scoring and change classification
magento_complexity Analyze cyclomatic complexity, function count, and line count

Utility Tools

Tool Description
magento_module_structure Show complete module structure -- controllers, models, blocks, plugins, observers, configs
magento_index Trigger re-indexing of the codebase (also kicks off background enrichment)
magento_describe Generate LLM descriptions for di.xml files (requires ANTHROPIC_API_KEY), stored in .magector/data.db, auto-reindexes affected files
magento_stats View index statistics
magento_batch Execute multiple tool queries in parallel in one MCP roundtrip. Supports all search, find, grep, read, and null-risk tools. Use to avoid N×3-5s roundtrip overhead.
magento_grep Exact text/regex search across PHP/XML/PHTML files (grep -rn -E internally). Supports filesOnly mode (like grep -l), context lines, ignoreCase, include patterns. (v2.9)
magento_read Read a specific file with optional methodName extraction (~10× fewer tokens than reading the whole file) and startLine/endLine range. (v2.10)
magento_trace_api Trace REST/GraphQL API endpoint from URL to implementation: webapi.xml → service interface → DI preference → method body. One call replaces 4-5 grep+read steps. (v2.11)
magento_trace_config Trace a config path end-to-end: system.xml admin definition → PHP classes that consume the value → actual DB values from config-data exports. Accepts exact path or keyword search. (v2.17)
magento_find_trigger Find database triggers across the codebase
magento_find_table_usage Find all PHP code referencing a specific database table

Null-Safety Analysis (v2.12–v2.15)

Tool Description
magento_ast_search Structural PHP code search using tree-sitter. Named patterns: dataobject-set-null (detect setX(null) anti-pattern), unchecked-method-chain (detect $this->dep->method() chains). Pattern arg is an enum, not free-text. Executed in Rust serve process — no external dependency. (v2.16)
magento_enrich Build the method-chain enrichment index. Scans all vendor/ PHP files for ->firstMethod()->secondMethod() chains and detects null guards in surrounding code. Stores results in .magector/data.db (SQLite, via Rust serve). Runs automatically after magento_index. (v2.13, moved to Rust v2.16)
magento_find_null_risks Query the enrichment index for method chains without null guards. O(1) SQLite query instead of file scanning. Pass firstMethod to filter (e.g., "getPayment" → all ->getPayment()->anything() without null guard). Requires magento_enrich. (v2.13)
magento_find_dataobject_issues Detect setX(null) anti-pattern on Magento DataObject subclasses. setX(null) stores ['x' => null] in _datahasX() (via array_key_exists) returns true even when the value is null, creating false-positive guard conditions. Use during field-lifecycle audits or when debugging "value persists but shouldn't" bugs. Uses tree-sitter. (v2.15, tree-sitter v2.16)

Search Enhancements (v2.1)

  • Hybrid BM25+vector search -- combines text frequency scoring with semantic vector similarity for better exact class name matches
  • Query expansion -- automatically expands queries with Magento domain synonyms (plugin → interceptor, checkout → cart/quote/totals, etc.)
  • Module filtering -- moduleFilter parameter on magento_search to limit results by vendor/module pattern. Accepts a single string or array of strings. Supports wildcards, e.g., "Vendor_*" or ["Acme_PaymentGateway", "Acme_FreeShipping"]
  • Non-blocking reindex -- old index stays usable during background rebuild; new index is built to a temp path and swapped in atomically on completion

Deep Code Analysis (v2.2)

  • magento_find_implementors -- find all classes implementing a PHP interface (PHP implements + di.xml <preference>)
  • magento_find_callers -- find all call sites of a method across PHP and XML files
  • magento_find_di_wiring -- complete DI picture: preferences, plugins, constructor args, virtual types, argument overrides
  • magento_trace_call_chain -- trace internal method execution chain: $this->method(), $this->dep->method(), and dispatch() calls with event→observer resolution

Data Flow & Event Tracing (v2.3)

  • magento_trace_data_flow -- trace all setters and getters for a data attribute (magic methods, setData/getData, addData, constants, XML references). Answers "who writes/reads custom_discounted_price_incl_tax on Quote\Address?"
  • magento_find_event_dispatchers -- grep-based exact search for all PHP locations dispatching a specific event, with method context and surrounding code. Complements magento_find_event_flow with higher precision.
  • magento_find_plugin area context -- enriched output shows DI area (frontend/adminhtml/global/graphql) and explicit di.xml plugin registrations when targetClass is provided

Tool Cross-References

Each tool description includes "See also" hints to help AI clients chain tools effectively:

graph LR
  cls["find_class"] --> plg["find_plugin"]
  cls --> prf["find_preference"]
  cls --> mtd["find_method"]
  cfg["find_config"] --> obs["find_observer"]
  cfg --> prf
  cfg --> api["find_api"]
  plg --> cls
  plg --> mtd
  tpl["find_template"] --> blk["find_block"]
  blk --> tpl
  blk --> cfg
  dbs["find_db_schema"] --> cls
  gql["find_graphql"] --> cls
  gql --> mtd
  ctl["find_controller"] --> cfg
  trc["trace_flow"] -.-> ctl
  trc -.-> plg
  trc -.-> obs
  trc -.-> tpl
  trc -.-> api
  trc -.-> gql
  dep["trace_dependency"] --> prf
  dep --> plg
  evf["find_event_flow"] --> obs
  imp["impact_analysis"] --> dep
  imp --> cls
  tst["find_test"] --> cls
  err["error_parser"] --> dep
  lay["find_layout"] --> blk

  style cls fill:#4a90d9,color:#fff
  style mtd fill:#4a90d9,color:#fff
  style cfg fill:#e8a838,color:#000
  style plg fill:#d94a4a,color:#fff
  style obs fill:#d94a4a,color:#fff
  style prf fill:#e8a838,color:#000
  style api fill:#e8a838,color:#000
  style tpl fill:#68b684,color:#000
  style blk fill:#68b684,color:#000
  style dbs fill:#9b59b6,color:#fff
  style gql fill:#9b59b6,color:#fff
  style ctl fill:#4a90d9,color:#fff
  style trc fill:#2ecc71,color:#000

Query Examples

magento_search("how are checkout totals calculated")
magento_search("product price with tier pricing and catalog rules")
magento_find_class("ProductRepositoryInterface")
magento_find_method("getById")
magento_find_config("di.xml plugin for ProductRepository")
magento_find_plugin({ targetClass: "Topmenu" })
magento_find_observer("sales_order_place_after")
magento_find_preference("StoreManagerInterface")
magento_find_api("/V1/orders")
magento_find_controller("catalog/product/view")
magento_find_graphql("placeOrder")
magento_find_db_schema("sales_order")
magento_find_cron("indexer")
magento_find_block("cart totals")
magento_find_template("minicart")
magento_analyze_diff({ commitHash: "abc123" })
magento_complexity({ module: "Magento_Catalog", threshold: 10 })
magento_describe()
magento_trace_flow({ entryPoint: "checkout/cart/add", depth: "deep" })
magento_trace_flow({ entryPoint: "/V1/products" })
magento_trace_flow({ entryPoint: "placeOrder", entryType: "graphql" })
magento_trace_flow({ entryPoint: "sales_order_place_after" })
magento_trace_data_flow({ attributeKey: "custom_discounted_price_incl_tax", modelClass: "Quote\\Address" })
magento_find_event_dispatchers({ eventName: "custom_discount_rule_validation_before" })
magento_find_implementors({ interfaceName: "ProductRepositoryInterface" })
magento_find_callers({ methodName: "collectTotals", className: "TotalsCollector" })
magento_find_di_wiring({ className: "CartManagementInterface" })
magento_trace_call_chain({ className: "Magento\\Quote\\Model\\QuoteManagement", methodName: "submit" })

Supported Platforms

Pre-built binaries are provided for the following platforms:

Platform Architecture Package
macOS ARM64 (Apple Silicon) @magector/cli-darwin-arm64
Linux x86_64 @magector/cli-linux-x64
Linux ARM64 @magector/cli-linux-arm64
Windows x86_64 @magector/cli-win32-x64

Note: macOS Intel (x86_64) is not supported as a pre-built binary. Intel Mac users can build from source.


Validation

Magector is validated at two levels:

  1. E2E MCP accuracy tests -- 101 queries across 16 tool categories via stdio JSON-RPC
  2. Rust-level validation -- 557 test cases across 50+ categories against Magento 2.4.7

E2E Accuracy (MCP Tools)

---
config:
  themeVariables:
    pie1: "#4caf50"
    pie2: "#f44336"
---
pie title Test Pass Rate (101 queries)
  "Passed (101)" : 101
  "Failed (0)" : 0
Metric Value
Grade A+ (99.2/100)
Pass rate 101/101 (100%)
Precision 98.7%
MRR 99.3%
NDCG@10 98.7%
Index size 35,795 vectors
Query time 10-45ms

Integration Tests

66 integration tests covering MCP protocol compliance, tool schemas, tool calls (including magento_describe), analysis tools, and stdout JSON integrity.

Running Tests

# E2E accuracy tests (101 queries, requires indexed codebase)
npm run test:accuracy
npm run test:accuracy:verbose

# Integration tests (66 tests)
npm test

# SONA/MicroLoRA benefit evaluation (180 queries, baseline vs post-training)
npm run test:sona-eval
npm run test:sona-eval:verbose

# Rust validation (557 test cases)
cd rust-core && cargo run --release -- validate -m ./magento2 --skip-index

Project Structure

magector/
├── src/                          # Node.js source
│   ├── cli.js                    # CLI entry point (npx magector <command>)
│   ├── mcp-server.js             # MCP server (47 tools, structured JSON output)
│   ├── binary.js                 # Platform binary resolver
│   ├── model.js                  # ONNX model resolver/downloader
│   ├── init.js                   # Full init command (index + IDE config)
│   ├── magento-patterns.js       # Magento pattern detection (JS)
│   ├── templates/                # IDE rules templates
│   │   ├── cursorrules.js        # .cursorrules content
│   │   └── claude-md.js          # CLAUDE.md content
│   └── validation/               # JS validation suite
│       ├── validator.js
│       ├── benchmark.js
│       ├── test-queries.js
│       ├── test-data-generator.js
│       └── accuracy-calculator.js
├── tests/                        # Automated tests
│   ├── mcp-server.test.js        # Integration tests (64 tests)
│   ├── mcp-accuracy.test.js      # E2E accuracy tests (101 queries)
│   ├── mcp-sona.test.js          # SONA feedback integration tests (8 tests)
│   ├── mcp-sona-eval.test.js     # SONA/MicroLoRA benefit evaluation (180 queries)
│   ├── describe-benefit-eval.test.js  # Description enrichment benefit evaluation
│   └── results/                  # Test result artifacts
│       ├── accuracy-report.json
│       └── sona-eval-report.json
├── platforms/                    # Platform-specific binary packages
│   ├── darwin-arm64/             # macOS ARM (Apple Silicon)
│   ├── linux-x64/                # Linux x64
│   ├── linux-arm64/              # Linux ARM64
│   └── win32-x64/                # Windows x64
├── rust-core/                    # Rust high-performance core
│   ├── Cargo.toml
│   ├── src/
│   │   ├── main.rs               # Rust CLI (index, search, serve, validate)
│   │   ├── lib.rs                # Library exports
│   │   ├── indexer.rs             # Core indexing with progress output
│   │   ├── embedder.rs            # ONNX embedding (MiniLM-L6-v2)
│   │   ├── vectordb.rs            # HNSW vector database + hybrid search + tombstones
│   │   ├── watcher.rs             # File watcher for incremental re-indexing
│   │   ├── ast.rs                 # Tree-sitter AST (PHP + JS)
│   │   ├── magento.rs             # Magento pattern detection (Rust)
│   │   ├── describe.rs            # LLM description generation + SQLite storage
│   │   ├── sona.rs                # SONA feedback learning + MicroLoRA + EWC++
│   │   └── validation.rs          # 557 test cases, validation framework
│   └── models/                   # ONNX model files (auto-downloaded)
│       ├── all-MiniLM-L6-v2.onnx
│       └── tokenizer.json
├── .github/
│   └── workflows/
│       └── release.yml           # Cross-compile + publish CI
├── scripts/
│   └── setup.sh                  # Claude Code MCP setup script
├── config/
│   └── mcp-config.json           # MCP server configuration template
├── package.json
├── .gitignore
├── LICENSE
└── README.md

How It Works

1. Indexing

Magector scans every .php, .js, .xml, .phtml, and .graphqls file in a Magento 2 or Adobe Commerce codebase:

  1. AST parsing -- Tree-sitter extracts class names, namespaces, methods, inheritance, and interface implementations from PHP and JavaScript files
  2. Pattern detection -- Identifies Magento-specific patterns: controllers, models, repositories, plugins, observers, blocks, GraphQL resolvers, admin grids, cron jobs, and more
  3. Search text enrichment -- Combines AST metadata with Magento pattern keywords to create semantically rich text representations
  4. Description enrichment -- If a descriptions SQLite DB is present, LLM-generated natural-language descriptions are prepended to the embedding text as "Description: {text}\n\n", placing semantic DI concepts (preferences, plugins, virtual types, subsystem names) within the 256-token ONNX window
  5. Embedding -- ONNX Runtime generates 384-dimensional vectors using all-MiniLM-L6-v2
  6. Indexing -- Vectors are stored in an HNSW index for sub-millisecond approximate nearest neighbor search

2. Searching

  1. Query text is enriched with pattern synonyms (e.g., "controller" adds "action execute http request dispatch")
  2. The enriched query is embedded into the same 384-dimensional vector space
  3. HNSW finds the nearest neighbors by cosine similarity
  4. Hybrid reranking boosts results with keyword matches in path and search text
  5. SONA adjustment -- MicroLoRA adapts the query embedding based on learned patterns; EWC++ prevents forgetting earlier learning
  6. Results are returned as structured JSON with file path, class name, methods, role badges, and content snippet

3. Persistent Serve Mode

The MCP server spawns a persistent Rust process (magector-core serve) that keeps the ONNX model and HNSW index loaded in memory. Queries are sent as JSON over stdin and responses returned via stdout -- eliminating the ~2.6s cold-start overhead of loading the model per query. Falls back to single-shot execFileSync if the serve process is unavailable.

flowchart LR
  subgraph startup ["Startup (once)"]
    S1["Load Model"] --> S2["Load Index"] --> S3["Ready Signal"]
  end
  startup --> query
  subgraph query ["Per Query (10-45ms)"]
    Q1["stdin JSON"] --> Q2["Embed"] --> Q3["HNSW Search"] --> Q4["Rerank"] --> Q5["stdout JSON"]
  end
  subgraph fallback ["Fallback"]
    F1["execFileSync ~2.6s"]
  end

  style startup fill:#e8f4e8,color:#000
  style query fill:#e8e8f4,color:#000
  style fallback fill:#f4e8e8,color:#000

4. File Watcher (Incremental Re-indexing)

When the serve process is started with --magento-root, a background thread polls the filesystem for changes every 60 seconds (configurable via --watch-interval). Changed files are incrementally re-indexed without restarting the server.

Since hnsw_rs does not support point deletion, Magector uses a tombstone strategy: old vectors for modified/deleted files are marked as tombstoned and filtered out of search results. New vectors are appended. When tombstoned entries exceed 20% of total vectors, the HNSW graph is automatically rebuilt (compacted) to reclaim memory and restore search performance.

flowchart LR
  W1["Sleep 60s"] --> W2["Scan Filesystem"] --> W3{"Changes?"}
  W3 -->|No| W1
  W3 -->|Yes| W4["Tombstone Old Vectors"] --> W5["Parse + Embed New Files"] --> W6["Append to HNSW"] --> W7{"Tombstone > 20%?"}
  W7 -->|Yes| W8["Compact / Rebuild HNSW"] --> W9["Save to Disk"]
  W7 -->|No| W9
  W9 --> W1

  style W4 fill:#f4e8e8,color:#000
  style W5 fill:#e8f4e8,color:#000
  style W8 fill:#e8e8f4,color:#000

5. MCP Integration

The MCP server delegates all search/index operations to the Rust core binary. Analysis tools (diff, complexity) use ruvector JS modules directly.

sequenceDiagram
  participant Dev
  participant AI
  participant MCP
  participant Rust
  participant HNSW

  Dev->>AI: "checkout totals?"
  AI->>MCP: magento_search(...)
  MCP->>Rust: JSON query
  Rust->>HNSW: embed + search
  HNSW-->>Rust: candidates
  Rust-->>MCP: JSON results
  MCP-->>AI: paths, methods, badges
  AI-->>Dev: TotalsCollector.php

6. SONA Feedback Learning

The MCP server tracks sequences of tool calls and sends feedback signals to the Rust process. Over time, this adjusts search result rankings based on observed usage patterns.

How it works: The Node.js SessionTracker watches for follow-up tool calls after magento_search. If a user searches and then immediately calls magento_find_plugin, SONA learns that similar queries should boost plugin results. The learned weights are persisted to a .sona file alongside the index.

MCP Call Sequence Signal Effect on Future Searches
magento_searchmagento_find_plugin (within 30s) refinement_to_plugin Boosts plugin results
magento_searchmagento_find_class (within 30s) refinement_to_class Boosts class matches
magento_searchmagento_find_config (within 30s) refinement_to_config Boosts config/XML results
magento_searchmagento_find_observer (within 30s) refinement_to_observer Boosts observer results
magento_searchmagento_find_controller (within 30s) refinement_to_controller Boosts controller results
magento_searchmagento_find_block (within 30s) refinement_to_block Boosts block results
magento_searchmagento_trace_flow (within 30s) trace_after_search Boosts controller results
magento_search(Q1)magento_search(Q2) (within 60s) query_refinement Tracked for analysis

Characteristics:

  • Score adjustments are capped at ±0.15 to avoid overwhelming semantic similarity
  • Learning rate decays with repeated observations (diminishing returns)
  • Learned weights are keyed by normalized, order-independent query term hashes
  • Always active -- no feature flags or build-time opt-in required
  • Persisted via bincode to <db_path>.sona (e.g., .magector/index.db.sona)

SONA v2: MicroLoRA + EWC++

SONA v2 adds embedding-level adaptation via a MicroLoRA adapter and Elastic Weight Consolidation:

Component Parameters Purpose
MicroLoRA 1536 (rank-2, 2×384×2) Adjusts query embeddings before HNSW search
EWC++ Fisher matrix (384 values) Prevents catastrophic forgetting during online learning
  • adjust_query_embedding() applies the LoRA transform + L2 normalization before vector search; cosine similarity guard (≥0.90) skips destructive adjustments
  • learn_with_embeddings() updates LoRA weights from feedback signals with EWC regularization (λ=2000) and decaying learning rate
  • 3-tier scoring with negative learning: positive signals boost the followed feature type, mild negative learning (0.1×) demotes unrelated types
  • V1→V2 persistence format is backward-compatible (auto-upgrades on load)
cd rust-core && cargo build --release

7. LLM Description Enrichment

Magector can generate natural-language descriptions of di.xml files using the Anthropic API and embed them directly into the vector index. This significantly improves search ranking for semantic queries about dependency injection.

Workflow:

# 1. Generate descriptions (one-time, incremental — only re-describes changed files)
ANTHROPIC_API_KEY=sk-... npx magector describe /path/to/magento

# 2. Re-index with descriptions embedded into vectors
npx magector index /path/to/magento

Or via the MCP tool: magento_describe() generates descriptions and auto-reindexes affected files in one step.

How it works: Each di.xml file is sent to Claude Sonnet with a prompt optimized for semantic search retrieval. The resulting description (~70 words) is stored in a SQLite database (.magector/data.db). During indexing, descriptions are prepended to the embedding text as "Description: {text}\n\n" before the raw file content, placing semantic terms (preferences, plugins, virtual types, subsystem names) within the ONNX model's 256-token window.

Measured impact (A/B experiment, 25 queries, Magento 2.4.7, 17,891 vectors, 371 described files):

Metric Without Descriptions With Descriptions Delta
Precision@K 1.6% 20.3% +18.7%
MRR 0.031 0.330 +0.30
NDCG@10 0.037 0.369 +0.33
di.xml results/query 0.2 3.0 +2.8
Query win rate 76%

Magento Patterns Detected

mindmap
  root((Patterns))
    PHP
      Controller
      Model
      Repository
      Block
      Helper
      ViewModel
    Interception
      Plugin
      Observer
      Preference
    XML
      di.xml
      events.xml
      webapi.xml
      routes.xml
      crontab.xml
      db_schema.xml
    Frontend
      Template
      JavaScript
      GraphQL

Magector understands these Magento 2 architectural patterns:

Pattern Detection Method Example
Controller Path + execute() method Controller/Adminhtml/Order/View.php
Model Path + extends AbstractModel Model/Product.php
Repository Path + implements RepositoryInterface Model/ProductRepository.php
Block Path + extends AbstractBlock Block/Product/View.php
Plugin Path + before/after/around methods Plugin/Product/SavePlugin.php
Observer Path + implements ObserverInterface Observer/ProductSaveObserver.php
GraphQL Resolver Path + implements ResolverInterface Model/Resolver/Products.php
Helper Path under Helper/ Helper/Data.php
Cron Path under Cron/ Cron/CleanExpiredQuotes.php
Console Command Path + extends Command Console/Command/IndexerReindex.php
Data Provider Path + DataProvider Ui/DataProvider/Product/Listing.php
ViewModel Path + implements ArgumentInterface ViewModel/Product/Breadcrumbs.php
Setup Patch Path + Patch/Data or Patch/Schema Setup/Patch/Data/AddAttribute.php
di.xml Path matching etc/di.xml, etc/frontend/di.xml
events.xml Path matching etc/events.xml
webapi.xml Path matching etc/webapi.xml
layout XML Path under layout/ view/frontend/layout/catalog_product_view.xml
Template .phtml extension view/frontend/templates/product/view.phtml
JavaScript .js with AMD/ES6 detection view/frontend/web/js/view/minicart.js
GraphQL Schema .graphqls extension etc/schema.graphqls

Configuration

Cursor IDE Rules

Copy .cursorrules to your Magento project root for optimized AI-assisted development. The rules instruct the AI to:

  1. Use Magector MCP tools before reading files manually
  2. Write effective semantic queries
  3. Follow Magento development patterns
  4. Interpret search results correctly

Excluding Directories (.magectorignore)

Magector automatically skips common non-project directories during indexing:

  • vendor/ — Composer dependencies (100K-500K files)
  • node_modules/ — npm packages
  • generated/ — DI-compiled files
  • var/ — cache, logs, sessions
  • pub/static/ — deployed static assets
  • dev/tests/, dev/tools/ — Magento development tools
  • Test/, Tests/, test/, tests/ — test directories
  • .git/ — version control

For project-specific exclusions, create a .magectorignore file in your Magento project root:

# .magectorignore — additional directories to exclude from Magector indexing
# One pattern per line, gitignore-like syntax

# Custom exclusions
pub/media
setup
update
phpserver
bin
lib/internal

Pattern rules:

  • Lines starting with # are comments
  • Empty lines are ignored
  • Trailing slashes are stripped (vendor/vendor)
  • Patterns without / match directory names anywhere in the tree
  • Patterns with / match relative paths from the project root

Config Data (core_config_data exports)

The magento_trace_config tool can show actual database config values alongside code analysis. Export your core_config_data table as JSON and place files in .magector/config-data/:

# MySQL 8.0+ with --json flag
mysql -u user -p magento_db -e "SELECT scope, scope_id, path, value FROM core_config_data" --json > .magector/config-data/CZ-production.json

# Older MySQL (no --json): pipe through python3
mysql -u user -p magento_db -B -e "SELECT scope, scope_id, path, value FROM core_config_data" | \
  python3 -c "import sys,json; lines=sys.stdin.read().strip().split('\n'); h=lines[0].split('\t'); \
  rows=[dict(zip(h,l.split('\t'))) for l in lines[1:]]; [r.update({'scope_id':int(r['scope_id'])}) for r in rows]; \
  json.dump(rows,sys.stdout,indent=2)" > .magector/config-data/CZ-production.json

# Or from n8n/API/any tool that produces:
# [{scope, scope_id, path, value}, ...]

File naming: Use {country}-{environment}.json, e.g.:

  • CZ-production.json
  • SK-staging.json
  • IT-production.json

When magento_trace_config traces a config path, it automatically looks up values from all available exports and shows them per environment.

Model Configuration

The ONNX model (all-MiniLM-L6-v2) is automatically downloaded on first run to ~/.magector/models/. To use a different location:

magector-core index -m /path/to/magento -c /custom/model/path

Development

Building from Source

git clone https://github.com/krejcif/magector.git
cd magector

# Install Node.js dependencies
npm install

# Build the Rust core
cd rust-core
cargo build --release
cd ..

# The CLI will automatically find the dev binary at rust-core/target/release/magector-core
node src/cli.js help

Building

# Rust core
cd rust-core
cargo build --release

# Run unit tests
cargo test

# Run validation
cargo run --release -- validate

Testing

# Integration tests (66 tests, requires indexed codebase)
npm test

# E2E accuracy tests (101 queries)
npm run test:accuracy
npm run test:accuracy:verbose

# Run without index (unit + schema tests only)
npm run test:no-index

# Rust unit tests (37 tests including SONA + descriptions)
cd rust-core && cargo test

# SONA integration tests (8 tests)
node tests/mcp-sona.test.js

# SONA/MicroLoRA benefit evaluation (180 queries)
npm run test:sona-eval

# Rust validation (557 test cases)
cd rust-core && cargo run --release -- validate -m ./magento2 --skip-index

Adding New Magento Patterns

  1. Add pattern detection in rust-core/src/magento.rs
  2. Add search text enrichment in rust-core/src/indexer.rs
  3. Add validation test cases in rust-core/src/validation.rs
  4. Add E2E accuracy test cases in tests/mcp-accuracy.test.js
  5. Rebuild and run validation to verify:
cargo build --release
./target/release/magector-core validate -m ./magento2 --skip-index
npm run test:accuracy

Adding MCP Tools

  1. Define the tool schema in src/mcp-server.js (ListToolsRequestSchema handler)
  2. Include keyword-rich descriptions and cross-tool "See also" references
  3. Implement the handler in the CallToolRequestSchema handler
  4. Return structured JSON via formatSearchResults()
  5. Add E2E test cases in tests/mcp-accuracy.test.js
  6. Test with Claude Code or the MCP inspector

Technical Details

Embedding Model

  • Model: all-MiniLM-L6-v2
  • Dimensions: 384
  • Pooling: Mean pooling with attention mask
  • Normalization: L2 normalized
  • Runtime: ONNX Runtime (via ort crate)

Vector Index

  • Algorithm: HNSW (Hierarchical Navigable Small World)
  • Library: hnsw_rs
  • Parameters: M=32, max_layers=16, ef_construction=200
  • Distance metric: Cosine similarity
  • Hybrid search: Semantic nearest-neighbor + keyword reranking in path and search text + SONA/MicroLoRA feedback adjustments
  • Incremental updates: Tombstone soft-delete + periodic HNSW rebuild (compact)
  • Persistence: Bincode V2 binary serialization (backward-compatible with V1)

Index Structure

Each indexed file produces a vector entry with metadata:

struct IndexMetadata {
    path: String,
    file_type: String,          // php, xml, js, template, graphql
    magento_type: String,       // controller, model, block, plugin, ...
    class_name: Option<String>,
    namespace: Option<String>,
    methods: Vec<String>,       // extracted method names
    search_text: String,        // enriched searchable text
    is_controller: bool,
    is_plugin: bool,
    is_observer: bool,
    is_model: bool,
    is_block: bool,
    is_repository: bool,
    is_resolver: bool,
    // ... 20+ pattern flags
}

Performance Characteristics

Operation Time Notes
Full index (36K vectors) ~1 min Parallel parsing + batched ONNX embedding
Single query (warm) 10-45ms Persistent serve process, HNSW + rerank
Single query (cold) ~2.6s Includes ONNX model + index load
Embedding generation ~2ms ONNX Runtime with CoreML/CUDA
Batch embedding (32) ~30ms Batched ONNX inference
Model load ~500ms One-time at startup
Index save/load <1s Bincode binary serialization

Performance Optimizations

  • Persistent serve mode -- Rust process keeps ONNX model + HNSW index in memory via stdin/stdout JSON protocol
  • Query cache -- LRU cache (200 entries) avoids re-embedding identical queries
  • Hybrid reranking -- combines semantic similarity with keyword matching for better precision
  • Batched ONNX embedding -- 32 texts per inference call (vs. 1-at-a-time), 3-5x faster embedding
  • Dynamic thread scaling -- ONNX intra-op threads scale to CPU core count
  • Thread-local AST parsers -- each rayon thread gets its own tree-sitter parser (no mutex contention)
  • Bincode persistence -- binary serialization replaces JSON (3-5x faster save/load, ~5x smaller files)
  • Adaptive HNSW capacity -- pre-sized to actual vector count
  • Parallel HNSW insert -- batch insert uses hnsw_rs parallel insertion on load and index
  • Tuned ef_search -- optimized search parameters for 36K vector index (ef_search=50 for search, 64 for hybrid)
  • SONA feedback learning -- learns from MCP tool call patterns to adjust search rankings; MicroLoRA adapts query embeddings, EWC++ prevents forgetting

Roadmap

gantt
  title Roadmap
  dateFormat YYYY-MM
  axisFormat %b
  section Done
    Hybrid search       :done, 2025-01, 30d
    Serve mode          :done, 2025-02, 30d
    JSON output         :done, 2025-03, 15d
    Cross-tool hints    :done, 2025-03, 15d
    E2E tests           :done, 2025-03, 15d
    Adobe Commerce      :done, 2025-03, 15d
  section Next
    SONA feedback       :done, 2025-04, 30d
    Incremental index   :done, 2025-04, 30d
    SONA v2 MicroLoRA   :done, 2025-05, 15d
    LLM descriptions    :done, 2025-06, 30d
    Method chunking     :active, 2025-07, 30d
    Intent detection    :2025-08, 30d
    Type filtering      :2025-09, 30d
  section Future
    VSCode extension    :2025-10, 60d
    Web UI              :2025-12, 60d
  • Hybrid search (semantic + keyword re-ranking)
  • Persistent serve mode (eliminates cold-start latency)
  • Structured JSON output (methods, badges, snippets)
  • Cross-tool discovery hints for AI clients
  • E2E accuracy test suite (101 queries)
  • Adobe Commerce support (B2B, Staging, and all Commerce-specific modules)
  • SONA feedback learning (search rankings adapt to MCP tool call patterns)
  • SONA v2 with MicroLoRA + EWC++ (embedding-level adaptation, prevents catastrophic forgetting)
  • LLM description enrichment (generate di.xml descriptions via Claude, store in SQLite, embed into vectors for improved search ranking)
  • Method-level chunking (per-method vectors for direct method search)
  • Query intent classification (auto-detect "give me XML" vs "give me PHP")
  • Filtered search by file type at the vector level
  • Incremental indexing (background file watcher with tombstone + compact strategy)
  • VSCode extension
  • Web UI for browsing results

Troubleshooting

All MCP server activity is logged to .magector/magector.log in the Magento project root. The log persists across MCP restarts and uses the format:

[2026-04-12T18:30:00.000Z] [LEVEL] message

Log Levels

Level Meaning
INFO Normal operations: startup config, tool completion, search fallbacks, enrichment progress
WARN Recoverable issues: slow grep queries (>5s), missing data.db, file read errors, serve process disconnects
ERR Failures: AST query errors, transaction rollbacks, serve process errors, tool execution errors
REQ Every tool call with full input parameters (JSON)
RES Tool completion with elapsed time in milliseconds
QUERY Rust serve process queries (search, feedback)
CACHE Search cache hits
INDEX Background reindex progress
SERVE Rust serve process stderr (watcher events, model loading)
FATAL Server startup failures

Common Diagnostic Commands

# Recent errors
grep '\[ERR\]\|\[FATAL\]' .magector/magector.log | tail -20

# Tool timing (find slow tools)
grep '\[RES\]' .magector/magector.log | tail -20

# Enrichment/null-risk analysis
grep 'enrich:\|null_risks:' .magector/magector.log | tail -20

# AST search (tree-sitter) issues
grep 'ast_search:' .magector/magector.log | tail -20

# Batch query breakdown (per-tool timing)
grep 'batch\[' .magector/magector.log | tail -20

# Slow grep queries
grep 'grep: slow\|grep: timed' .magector/magento.log | tail -20

# Full startup sequence
grep 'server starting\|Config:\|primary\|Serve process' .magector/magector.log | tail -30

What Gets Logged (v2.14+)

Every tool call logs [REQ] with input parameters and [RES] with elapsed time. Additionally:

  • magento_ast_search — tree-sitter pattern, target path, execution time, result count, query errors
  • magento_enrich — file count, progress every 10k files, read errors, transaction failures, final summary
  • magento_find_null_risks — query parameters, result count, query timing, missing DB warnings
  • magento_batch — query list on entry, per-sub-tool timing and errors
  • magento_grep — slow query warnings (>5s), timeout detection
  • magento_read — file-not-found with error codes, failed method extractions

License

MIT License. See LICENSE for details.


Contributing

Contributions are welcome. Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/improvement)
  3. Add tests for new functionality
  4. Run validation to ensure accuracy doesn't regress: npm run test:accuracy
  5. Submit a pull request

Built with Rust and Node.js for the Magento and Adobe Commerce community.