JSPM

@tb.p/dd-cursor

2.0.0
  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 1
  • Score
    100M100P100Q18519F
  • License MIT

A comprehensive command-line tool for finding and removing duplicate files using content-based hashing

Package Exports

  • @tb.p/dd-cursor
  • @tb.p/dd-cursor/index.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (@tb.p/dd-cursor) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

File Deduplication Tool

A comprehensive command-line tool for finding and removing duplicate files using content-based hashing. This NPX package provides extensive parameters and validation for advanced file management scenarios.

Quick Start

# Install and run
npx @tb.p/dd

# Find duplicates in specific directory
npx @tb.p/dd --targets /path/to/directory

# Scan multiple directories
npx @tb.p/dd --targets "dir1;dir2;dir3"

# Dry run to see what would be moved
npx @tb.p/dd --dry-run

Features

🔍 Advanced Detection

  • Content-based duplicate detection using BLAKE3 hashing (default)
  • Support for multiple hash algorithms (SHA-1, SHA-256, SHA-512, MD5, BLAKE2, BLAKE3)
  • Parallel processing for improved performance

📁 Multi-Directory Support

  • Scan multiple directories simultaneously
  • Semicolon-separated directory lists
  • Recursive directory traversal

🛡️ Safety Features

  • Dry-run mode to preview operations
  • Confirmation prompts before moving files
  • Safe file operations (move to subdirectory, never delete)
  • System file exclusion

🎯 Advanced Filtering

  • File extension filtering (include/exclude)
  • Regex pattern matching
  • Size-based filtering (min/max size)
  • Date-based filtering (newer/older than)

Duplicate Handling

  • Automatic Move: Duplicates are automatically moved to duplicates/ subdirectory preserving path structure (default)
  • Dry Run: Use --dry-run to preview what would be moved without making changes

📊 Output Formats

  • Table (human-readable)
  • JSON (programmatic processing)
  • CSV (spreadsheet analysis)
  • XML (structured data)
  • YAML (configuration-like)

🔧 Advanced Options

  • Configurable parallel processing
  • Memory limit controls
  • SQLite database storage for persistent data
  • Caching and resume functionality
  • Detailed reporting and statistics
  • Comprehensive validation
  • Minimal output by default with verbose mode for detailed progress

Installation

npx @tb.p/dd

Global Installation

npm install -g @tb.p/dd

Basic Usage

Find Duplicates

# Current directory
npx @tb.p/dd

# Specific directory
npx @tb.p/dd --targets /path/to/photos

# Multiple directories
npx @tb.p/dd --targets "C:\Photos;D:\Backup;E:\Archive"

Duplicate Handling

# Move duplicates to duplicates/ subdirectory (automatic)
npx @tb.p/dd

# Move without confirmation (force)
npx @tb.p/dd --force

# Move with custom duplicates directory
npx @tb.p/dd --target-dir ./duplicates

Advanced Filtering

# Only image files larger than 1MB
npx @tb.p/dd --extensions jpg,png,gif --min-size 1MB

# Exclude system files and hidden files
npx @tb.p/dd --exclude-system --exclude-hidden

# Only files modified in the last 7 days
npx @tb.p/dd --newer-than 7d

Key Parameters

Directory Selection

  • --targets <directories> - Semicolon-separated directory list (default: current directory)

File Filtering

  • --extensions <extensions> - Include specific file extensions (no dots, case insensitive) (default: all files)
  • --exclude-extensions <extensions> - Exclude specific file extensions (no dots, case insensitive)
  • --include-pattern <pattern> - Include files matching regex
  • --exclude-pattern <pattern> - Exclude files matching regex
  • --min-size <size> - Minimum file size
  • --max-size <size> - Maximum file size
  • --newer-than <date> - Files newer than date
  • --older-than <date> - Files older than date

Duplicate Handling

  • --target-dir <dir> - Target directory for duplicate files (default: duplicates/ subdirectory)
  • --keep-strategy <strategy> - Which duplicate to keep: priority
  • Automatic Behavior: Duplicates are always moved to subdirectory preserving path structure

Output

  • --format <format> - Output format: table, json, csv, xml, yaml
  • --output <file> - Output file path
  • --report <file> - Generate detailed report

Safety

  • --dry-run - Preview without making changes
  • --force - Skip confirmation prompts
  • --exclude-system - Exclude system files
  • --exclude-hidden - Exclude hidden files
  • No File Deletion: Files are moved to duplicates/ subdirectory, never deleted

Database

  • --save <file> - SQLite database file to store deduplication data
  • --resume <file> - Resume from SQLite database
  • --db-info <file> - Show database information
  • --db-clean <file> - Clean database (remove orphaned entries)
  • --export-csv <file> - Export duplicates to CSV file

Utilities

  • --get-exts - Get comma-separated list of file extensions in target directories

Examples

Photo Library Cleanup

# Find duplicate photos
npx @tb.p/dd --targets "C:\Photos;D:\Backup\Photos" --extensions jpg,png,gif --min-size 100KB

# Move duplicates to duplicates/ subdirectory preserving path structure
npx @tb.p/dd --targets "C:\Photos;D:\Backup\Photos" --extensions jpg,png,gif --keep-strategy priority

Document Management

# Find duplicate documents
npx @tb.p/dd --targets "C:\Documents;D:\Archive" --extensions pdf,doc,docx --min-size 10KB

# Export duplicate list to CSV
npx @tb.p/dd --targets "C:\Documents;D:\Archive" --extensions pdf,doc,docx --format csv --output document-duplicates.csv

System Cleanup

# Find duplicates excluding system files
npx @tb.p/dd --targets "C:\Users" --exclude-system --exclude-hidden --min-size 1MB

# Move duplicates to duplicates/ subdirectory
npx @tb.p/dd --targets "C:\Users" --exclude-system --exclude-hidden --min-size 1MB

Database Operations

# Save deduplication data to SQLite database
npx @tb.p/dd --targets "C:\Photos;D:\Backup" --save ./dedupe.sqlite

# Resume from existing database (loads all parameters from meta table)
npx @tb.p/dd --resume ./dedupe.sqlite

# Move duplicates from existing database (automatic)
npx @tb.p/dd --resume ./dedupe.sqlite

# Override multiple parameters
npx @tb.p/dd --resume ./dedupe.sqlite --keep-strategy priority

Utility Operations

# Get list of file extensions in target directories
npx @tb.p/dd --get-exts --targets "C:\Photos;D:\Backup"
# Output: jpg,png,gif,bmp,tiff,raw,pdf,doc,docx,txt

# Get extensions from single directory
npx @tb.p/dd --get-exts --targets "C:\Documents"
# Output: pdf,doc,docx,txt,xlsx,pptx

Output Behavior

Default (Minimal) Output

Shows current phase and progress percentage:

Scanning Phase: 1250 files out of 5000 or 25.00%
Analysis Phase: 4563256 files out of 45674567 or 9.99%
Moving Phase: 25 duplicate groups processed

Verbose Output (--verbose)

Shows detailed file-by-file progress and processing details.

Output Formats

Table Format (Default)

📊 Duplicate Analysis Results:
──────────────────────────────────────────────────
Total files scanned: 1000
Duplicate groups found: 25
Duplicate files: 50
Space that can be saved: 2.5 MB

🔍 Duplicate Groups:
──────────────────────────────────────────────────

Group 1 (1.2 MB each):
  ✓ KEEP /path/to/file1.jpg
  ✗ MOVE /path/to/file2.jpg
  ✗ MOVE /path/to/file3.jpg

JSON Format

{
  "summary": {
    "totalFiles": 1000,
    "duplicateGroups": 25,
    "duplicateFiles": 50,
    "spaceSaved": 2621440
  },
  "duplicates": [
    {
      "hash": "abc123...",
      "size": 1258291,
      "files": [
        {
          "path": "/path/to/file1.jpg",
          "action": "keep"
        },
        {
          "path": "/path/to/file2.jpg",
          "action": "move"
        }
      ]
    }
  ]
}

Safety Features

Data Protection

  • Dry Run Mode: Preview operations without making changes
  • Confirmation Prompts: Ask before removing files
  • Safe Operations: Move duplicates to subdirectory instead of deleting
  • Force Mode: Skip confirmations for automated scripts

Error Handling

  • Graceful Degradation: Continue processing despite individual file errors
  • Detailed Error Messages: Clear indication of what went wrong
  • Validation Failures: Stop before processing with invalid parameters

System Protection

  • System File Exclusion: Avoid modifying critical system files
  • Hidden File Handling: Option to exclude hidden files
  • Permission Handling: Graceful handling of permission errors

Performance

Scalability

  • Parallel Processing: Configurable number of parallel hash calculations
  • Memory Management: Configurable memory limits for large file sets
  • Progress Reporting: Real-time progress indication for long operations

Optimization

  • Database: Resume interrupted operations using SQLite database
  • Streaming: Process files without loading entire content into memory
  • Efficient Hashing: Use appropriate hash algorithms for the use case

Requirements

  • Node.js 14.0 or higher
  • Sufficient disk space for temporary files
  • Read access to source directories
  • Write access to target directories (for move/copy actions)

License

MIT

Contributing

Contributions are welcome! Please read the project specification and validation rules before submitting pull requests.

Support

For issues and questions, please refer to the documentation or create an issue in the project repository.