JSPM

  • Created
  • Published
  • Downloads 2372
  • Score
    100M100P100Q128065F
  • License MIT

Semantic duplicate pattern detection for AI-generated code - finds similar implementations that waste AI context tokens

Package Exports

  • @aiready/pattern-detect

Readme

@aiready/pattern-detect

Semantic duplicate pattern detection for AI-generated code

Finds semantically similar but syntactically different code patterns that waste AI context and confuse models.

🏛️ Architecture

                           🎯 USER
                             │
                             ▼
                   🎛️  CLI (orchestrator)
                             │
                             ▼
                     🏢 HUB (core)
                             │
      ┌──────────────────────┼───────────┬───────────┐
      ▼                      ▼           ▼           ▼
┌─────────────┐      📦 CONTEXT  🔧 CONSIST  📚 DOC
│ 📊 PATTERN  │ ⬅ YOU ARE HERE   ENCY        DRIFT
│   DETECT    │       ANALYZER
│  ✅ Ready   │      ✅ Ready    ✅ Ready    🔜 Soon
└─────────────┘

🌍 Language Support

Currently Supported (64% market coverage):

  • TypeScript (.ts, .tsx) - AST-based pattern extraction
  • JavaScript (.js, .jsx) - AST-based pattern extraction
  • Python (.py) - Function/class pattern extraction, similarity scoring

Roadmap:

  • 🔜 Java (Q3 2026) - Method/class patterns, Spring annotations
  • 🔜 Go (Q4 2026) - Function patterns, interface implementations
  • 🔜 Rust (Q4 2026) - Function/trait patterns, macro detection
  • 🔜 C# (Q1 2027) - Method/class patterns, LINQ queries

🚀 Quick Start

Zero config, works out of the box:

# Run without installation (recommended)
npx @aiready/pattern-detect ./src

# Or use the unified CLI (includes all AIReady tools)
npx @aiready/cli scan ./src

# Or install globally for simpler command and faster runs
npm install -g @aiready/pattern-detect
aiready-patterns ./src

🎯 Input & Output

Input: Path to your source code directory

aiready-patterns ./src

Output: Terminal report + optional JSON file (saved to .aiready/ directory)

📊 Duplicate Pattern Analysis
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📁 Files analyzed: 47
⚠️  Duplicate patterns: 12 files with 23 issues
💰 Wasted tokens: 8,450

CRITICAL (6 files)
  src/handlers/users.ts - 4 duplicates (1,200 tokens)
  src/handlers/posts.ts - 3 duplicates (950 tokens)

✨ Smart Defaults (Zero Config)

  • Auto-excludes test files (**/*.test.*, **/*.spec.*, **/__tests__/**)
  • Auto-excludes build outputs (dist/, build/, .next/)
  • Auto-excludes dependencies (node_modules/)
  • Adaptive threshold: Adjusts similarity detection based on codebase size
  • Pattern classification: Automatically categorizes duplicates (API handlers, validators, etc.)

Override defaults with --include-tests or --exclude <patterns> as needed

🎯 What It Does

AI tools generate similar code in different ways because they lack awareness of your codebase patterns. This tool:

  • Semantic detection: Finds functionally similar code (not just copy-paste) using Jaccard similarity on AST tokens
  • Pattern classification: Groups duplicates by type (API handlers, validators, utilities, etc.)
  • Token cost analysis: Shows wasted AI context budget
  • Refactoring guidance: Suggests specific fixes per pattern type

How It Works

The tool uses Jaccard similarity to compare code semantically:

  1. Parses TypeScript/JavaScript files into Abstract Syntax Trees (AST)
  2. Extracts semantic tokens (identifiers, operators, keywords) from each function
  3. Calculates Jaccard similarity between token sets: |A ∩ B| / |A ∪ B|
  4. Groups similar functions above the similarity threshold

This approach catches duplicates even when variable names or minor logic differs.

Example Output

📁 Files analyzed: 47
⚠  Duplicate patterns found: 23
💰 Token cost (wasted): 8,450

🌐 api-handler      12 patterns
✓  validator        8 patterns
🔧 utility          3 patterns

1. 87% 🌐 api-handler
   src/api/users.ts:15 ↔ src/api/posts.ts:22
   432 tokens wasted
   → Create generic handler function

⚙️ Key Options

# Basic usage
aiready patterns ./src

# Focus on obvious duplicates
aiready patterns ./src --similarity 0.9

# Include smaller patterns
aiready patterns ./src --min-lines 3

# Export results (saved to .aiready/ by default)
aiready patterns ./src --output json

# Or specify custom path
aiready patterns ./src --output json --output-file custom-report.json

📁 Output Files: By default, all output files are saved to the .aiready/ directory in your project root. You can override this with --output-file.

🎛️ Tuning Guide

Main Parameters

Parameter Default Effect Use When
--similarity 0.4 Similarity threshold (0-1) Want more/less sensitive detection
--min-lines 5 Minimum lines per pattern Include/exclude small functions
--min-shared-tokens 8 Tokens that must match Control comparison strictness

Quick Tuning Scenarios

Want more results? (catch subtle duplicates)

# Lower similarity threshold
aiready patterns ./src --similarity 0.3

# Include smaller functions  
aiready patterns ./src --min-lines 3

# Both together
aiready patterns ./src --similarity 0.3 --min-lines 3

Want fewer but higher quality results? (focus on obvious duplicates)

# Higher similarity threshold
aiready patterns ./src --similarity 0.8

# Larger patterns only
aiready patterns ./src --min-lines 10

Analysis too slow? (optimize for speed)

# Focus on substantial functions
aiready patterns ./src --min-lines 10

# Reduce comparison candidates
aiready patterns ./src --min-shared-tokens 12

Parameter Tradeoffs

Adjustment More Results Faster Higher Quality Tradeoff
Lower --similarity More false positives
Lower --min-lines Includes trivial duplicates
Higher --similarity Misses subtle duplicates
Higher --min-lines Misses small but important patterns

Common Workflows

First run (broad discovery):

aiready patterns ./src  # Default settings

Focus on critical issues (production ready):

aiready patterns ./src --similarity 0.8 --min-lines 8

Catch everything (comprehensive audit):

aiready patterns ./src --similarity 0.3 --min-lines 3

Performance optimization (large codebases):

aiready patterns ./src --min-lines 10 --min-shared-tokens 10

📁 Configuration File

Create an aiready.json or aiready.config.json file in your project root:

{
  "scan": {
    "include": ["src/**/*.{ts,tsx,js,jsx}"],
    "exclude": ["**/*.test.*", "**/dist/**"]
  },
  "tools": {
    "pattern-detect": {
      "minSimilarity": 0.6,
      "minLines": 8,
      "maxResults": 20,
      "minSharedTokens": 10,
      "maxCandidatesPerBlock": 100
    }
  },
  "output": {
    "format": "console",
    "file": ".aiready/pattern-report.json"
  }
}

Configuration Options:

Option Type Default Description
minSimilarity number 0.4 Similarity threshold (0-1)
minLines number 5 Minimum lines to consider
maxResults number 10 Max results to display in console
minSharedTokens number 8 Min tokens that must match
maxCandidatesPerBlock number 100 Performance tuning limit
approx boolean true Use approximate candidate selection
severity string 'all' Filter: 'critical', 'high', 'medium', 'all'

Use the unified CLI for all AIReady tools:

npm install -g @aiready/cli

# Pattern detection
aiready patterns ./src

# Context analysis (token costs, fragmentation)
aiready context ./src

# Consistency checking (naming, patterns)
aiready consistency ./src

# Full codebase analysis
aiready scan ./src

Related packages:

🌐 Visit Our Website

Try AIReady tools online and optimize your codebase: getaiready.dev


Made with 💙 by the AIReady team | Website