JSPM

file-duplicates

1.0.2
  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 19
  • Score
    100M100P100Q41056F
  • License ISC

Recursively search for duplicates of the target file or buffer in the specified directory, returning the corresponding absolute paths.

Package Exports

  • file-duplicates

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (file-duplicates) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

file-duplicates

During the development of a node app I had to check for potential duplicates of files uploaded by users. I wrote this package to address this particular problem: given in input a file or buffer, it returns the absolute paths to duplicated files starting from the specified directory (otherwise the working directory will be used as starting point). The matching algorithm uses SHA-1 checksum to compare files, which gives extremely low probability of collisions. It is possible to provide an array of patterns to ignore specific files or folders. Both sync and async API are provided.

Information

Package file-duplicates
Node Version >= 4.8.4

Table of Contents

Installation

Install package as dependency:

npm install --save file-duplicates

Usage

var fd = require("file-duplicates");
var fs = require("fs");
var filePath = "path/to/file";
var dirPath = "path/to/dir";

// async - promise
fd.find(filePath, dirPath).then(function(paths) {
    console.log(paths);
}).catch(err) {
    throw err;
};

// async - ignore patterns - callback
fd.find(filePath, dirPath, [".*", "node_modules", "**/*.txt", "path/to/specific/fileOrFolder"], function(err, paths) {
    if (err)
        throw err;
    console.log(paths);
});

// async with buffer
fs.readFile(filePath, function(err, buffer) {
    if (err)
        throw err;
    fs.find(buffer, dirPath, [".*", "*.js"]).then(function(paths) {
        console.log(paths);
    }).catch(function(err) {
        throw err;
    });
})

// sync (if not provided dirPath is the working directory)
var paths = fs.findSync(filePath, [".*", "*.js"]);

API

/**
 * Recursively search for duplicates of the target file or buffer in the specified directory, returning the corresponding absolute paths (ASYNC).
 * @param {string or Buffer} pathOrBuffer - Path or buffer of the file to search.
 * @param {string} [dirPath] - Directory which represents the starting point of the search. Default is the working directory.
 * @param {Array} [ignorePatterns] - An array of patterns that will be excluded from the search (e.g. ["*.", "node_modules", "*.txt", "path/to/file", "path/to/directory"]).
 * @param {function} [cb] - Callback of type function(err, result). If not provided a Promise will be returned instead.
 * @return {} - Callback or Promise fulfilled with an array of absolute paths to duplicated files.
 */
function find(pathOrBuffer, dirPath, ignorePatterns, cb) { }


/**
 * Recursively search for duplicates of the target file or buffer in the specified directory, returning the corresponding absolute paths (SYNC).
 * @param {string or Buffer} pathOrBuffer - Path or buffer of the file to search.
 * @param {string} [dirPath] - Directory which represents the starting point of the search. Default is the working directory.
 * @param {Array} [ignorePatterns] - An array of patterns that will be excluded from the search (e.g. ["*.", "node_modules", "*.txt", "path/to/file", "path/to/directory"]).
 * @return {Array} - An array of absolute paths to duplicated files.
 */
function findSync(pathOrBuffer, dirPath, ignorePatterns) { } 

Notes

  • If dirPath is not provided, the search will start at working directory level (the one returned by process.cwd() i.e. the directory from which node command is invoked).
  • I suggest to use always absolute paths. If you want to use relative paths, please make sure that they are relative to the directory specified as second argument or to the working directory (see above) if no directory is provided.
  • For big files I strongly suggest to use the async version which uses a stream to read files chunk by chunk (the sync version instead load the entire file into memory before computing the checksum).

License

MIT License