Package Exports
- @tb.p/dd-cursor
- @tb.p/dd-cursor/index.js
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (@tb.p/dd-cursor) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
File Deduplication Tool
A comprehensive command-line tool for finding and removing duplicate files using content-based hashing. This NPX package provides extensive parameters and validation for advanced file management scenarios.
Quick Start
# Install and run
npx @tb.p/dd
# Find duplicates in specific directory
npx @tb.p/dd --targets /path/to/directory
# Scan multiple directories
npx @tb.p/dd --targets "dir1;dir2;dir3"
# Dry run to see what would be moved
npx @tb.p/dd --dry-runFeatures
🔍 Advanced Detection
- Content-based duplicate detection using BLAKE3 hashing (default)
- Support for multiple hash algorithms (SHA-1, SHA-256, SHA-512, MD5, BLAKE2, BLAKE3)
- Parallel processing for improved performance
📁 Multi-Directory Support
- Scan multiple directories simultaneously
- Semicolon-separated directory lists
- Recursive directory traversal
🛡️ Safety Features
- Dry-run mode to preview operations
- Confirmation prompts before moving files
- Safe file operations (move to subdirectory, never delete)
- System file exclusion
🎯 Advanced Filtering
- File extension filtering (include/exclude)
- Regex pattern matching
- Size-based filtering (min/max size)
- Date-based filtering (newer/older than)
⚡ Duplicate Handling
- Automatic Move: Duplicates are automatically moved to
duplicates/subdirectory preserving path structure (default) - Dry Run: Use
--dry-runto preview what would be moved without making changes
📊 Output Formats
- Table (human-readable)
- JSON (programmatic processing)
- CSV (spreadsheet analysis)
- XML (structured data)
- YAML (configuration-like)
🔧 Advanced Options
- Configurable parallel processing
- Memory limit controls
- SQLite database storage for persistent data
- Caching and resume functionality
- Detailed reporting and statistics
- Comprehensive validation
- Minimal output by default with verbose mode for detailed progress
Installation
NPX (Recommended)
npx @tb.p/ddGlobal Installation
npm install -g @tb.p/ddBasic Usage
Find Duplicates
# Current directory
npx @tb.p/dd
# Specific directory
npx @tb.p/dd --targets /path/to/photos
# Multiple directories
npx @tb.p/dd --targets "C:\Photos;D:\Backup;E:\Archive"Duplicate Handling
# Move duplicates to duplicates/ subdirectory (automatic)
npx @tb.p/dd
# Move without confirmation (force)
npx @tb.p/dd --force
# Move with custom duplicates directory
npx @tb.p/dd --target-dir ./duplicatesAdvanced Filtering
# Only image files larger than 1MB
npx @tb.p/dd --extensions jpg,png,gif --min-size 1MB
# Exclude system files and hidden files
npx @tb.p/dd --exclude-system --exclude-hidden
# Only files modified in the last 7 days
npx @tb.p/dd --newer-than 7dKey Parameters
Directory Selection
--targets <directories>- Semicolon-separated directory list (default: current directory)
File Filtering
--extensions <extensions>- Include specific file extensions (no dots, case insensitive) (default: all files)--exclude-extensions <extensions>- Exclude specific file extensions (no dots, case insensitive)--include-pattern <pattern>- Include files matching regex--exclude-pattern <pattern>- Exclude files matching regex--min-size <size>- Minimum file size--max-size <size>- Maximum file size--newer-than <date>- Files newer than date--older-than <date>- Files older than date
Duplicate Handling
--target-dir <dir>- Target directory for duplicate files (default: duplicates/ subdirectory)--keep-strategy <strategy>- Which duplicate to keep: priority- Automatic Behavior: Duplicates are always moved to subdirectory preserving path structure
Output
--format <format>- Output format: table, json, csv, xml, yaml--output <file>- Output file path--report <file>- Generate detailed report
Safety
--dry-run- Preview without making changes--force- Skip confirmation prompts--exclude-system- Exclude system files--exclude-hidden- Exclude hidden files- No File Deletion: Files are moved to
duplicates/subdirectory, never deleted
Database
--save <file>- SQLite database file to store deduplication data--resume <file>- Resume from SQLite database--db-info <file>- Show database information--db-clean <file>- Clean database (remove orphaned entries)--export-csv <file>- Export duplicates to CSV file
Utilities
--get-exts- Get comma-separated list of file extensions in target directories
Examples
Photo Library Cleanup
# Find duplicate photos
npx @tb.p/dd --targets "C:\Photos;D:\Backup\Photos" --extensions jpg,png,gif --min-size 100KB
# Move duplicates to duplicates/ subdirectory preserving path structure
npx @tb.p/dd --targets "C:\Photos;D:\Backup\Photos" --extensions jpg,png,gif --keep-strategy priorityDocument Management
# Find duplicate documents
npx @tb.p/dd --targets "C:\Documents;D:\Archive" --extensions pdf,doc,docx --min-size 10KB
# Export duplicate list to CSV
npx @tb.p/dd --targets "C:\Documents;D:\Archive" --extensions pdf,doc,docx --format csv --output document-duplicates.csvSystem Cleanup
# Find duplicates excluding system files
npx @tb.p/dd --targets "C:\Users" --exclude-system --exclude-hidden --min-size 1MB
# Move duplicates to duplicates/ subdirectory
npx @tb.p/dd --targets "C:\Users" --exclude-system --exclude-hidden --min-size 1MBDatabase Operations
# Save deduplication data to SQLite database
npx @tb.p/dd --targets "C:\Photos;D:\Backup" --save ./dedupe.sqlite
# Resume from existing database (loads all parameters from meta table)
npx @tb.p/dd --resume ./dedupe.sqlite
# Move duplicates from existing database (automatic)
npx @tb.p/dd --resume ./dedupe.sqlite
# Override multiple parameters
npx @tb.p/dd --resume ./dedupe.sqlite --keep-strategy priorityUtility Operations
# Get list of file extensions in target directories
npx @tb.p/dd --get-exts --targets "C:\Photos;D:\Backup"
# Output: jpg,png,gif,bmp,tiff,raw,pdf,doc,docx,txt
# Get extensions from single directory
npx @tb.p/dd --get-exts --targets "C:\Documents"
# Output: pdf,doc,docx,txt,xlsx,pptxOutput Behavior
Default (Minimal) Output
Shows current phase and progress percentage:
Scanning Phase: 1250 files out of 5000 or 25.00%
Analysis Phase: 4563256 files out of 45674567 or 9.99%
Moving Phase: 25 duplicate groups processedVerbose Output (--verbose)
Shows detailed file-by-file progress and processing details.
Output Formats
Table Format (Default)
📊 Duplicate Analysis Results:
──────────────────────────────────────────────────
Total files scanned: 1000
Duplicate groups found: 25
Duplicate files: 50
Space that can be saved: 2.5 MB
🔍 Duplicate Groups:
──────────────────────────────────────────────────
Group 1 (1.2 MB each):
✓ KEEP /path/to/file1.jpg
✗ MOVE /path/to/file2.jpg
✗ MOVE /path/to/file3.jpgJSON Format
{
"summary": {
"totalFiles": 1000,
"duplicateGroups": 25,
"duplicateFiles": 50,
"spaceSaved": 2621440
},
"duplicates": [
{
"hash": "abc123...",
"size": 1258291,
"files": [
{
"path": "/path/to/file1.jpg",
"action": "keep"
},
{
"path": "/path/to/file2.jpg",
"action": "move"
}
]
}
]
}Safety Features
Data Protection
- Dry Run Mode: Preview operations without making changes
- Confirmation Prompts: Ask before removing files
- Safe Operations: Move duplicates to subdirectory instead of deleting
- Force Mode: Skip confirmations for automated scripts
Error Handling
- Graceful Degradation: Continue processing despite individual file errors
- Detailed Error Messages: Clear indication of what went wrong
- Validation Failures: Stop before processing with invalid parameters
System Protection
- System File Exclusion: Avoid modifying critical system files
- Hidden File Handling: Option to exclude hidden files
- Permission Handling: Graceful handling of permission errors
Performance
Scalability
- Parallel Processing: Configurable number of parallel hash calculations
- Memory Management: Configurable memory limits for large file sets
- Progress Reporting: Real-time progress indication for long operations
Optimization
- Database: Resume interrupted operations using SQLite database
- Streaming: Process files without loading entire content into memory
- Efficient Hashing: Use appropriate hash algorithms for the use case
Requirements
- Node.js 14.0 or higher
- Sufficient disk space for temporary files
- Read access to source directories
- Write access to target directories (for move/copy actions)
License
MIT
Contributing
Contributions are welcome! Please read the project specification and validation rules before submitting pull requests.
Support
For issues and questions, please refer to the documentation or create an issue in the project repository.