Package Exports
- octocode-data-masker
- octocode-data-masker/dist/index.cjs
- octocode-data-masker/dist/index.esm.js
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (octocode-data-masker) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
sensitive-data-masker
A high-performance TypeScript library for detecting and masking sensitive data in strings. Protect PII, API keys, tokens, credentials, and other confidential information with intelligent masking algorithms and configurable accuracy levels.
Features
- 🛡️ 200+ Detection Patterns: Comprehensive coverage for modern security needs
- ⚡ High Performance: Optimized regex engine with pattern caching
- 🎯 Accuracy Control: Configure detection sensitivity (high/medium/low)
- 🔧 Flexible Masking: Smart partial masking that preserves readability
- 📦 Zero Dependencies: Lightweight and secure
- 🌍 International Support: Handles US, UK, Canadian, and international formats
- 🔍 Pattern Filtering: Include or exclude specific pattern types
- 📊 Detailed Results: Get match counts, positions, and masked values
Installation
npm install sensitive-data-maskeryarn add sensitive-data-maskerQuick Start
import { mask, hasSensitiveContent, getPatternMatches } from 'sensitive-data-masker';
// Basic usage - intelligent partial masking
const text = 'My email is john@example.com and my SSN is 123-45-6789';
const result = mask(text);
console.log(result.output);
// "My email is **hn@example.c** and my SSN is **3-45-67**"
console.log(result.found);
// { email: 1, ssn: 1 }
// Check if content contains sensitive data
const isSensitive = hasSensitiveContent(text);
console.log(isSensitive); // true
// Get detailed pattern matches with positions
const matches = getPatternMatches(text);
console.log(matches);
// [
// {
// pattern: 'email',
// matches: [{ match: 'john@example.com', startIndex: 12, endIndex: 27 }]
// },
// {
// pattern: 'ssn',
// matches: [{ match: '123-45-6789', startIndex: 44, endIndex: 54 }]
// }
// ]API Reference
mask(input: string, options?: MaskingOptions): MaskResult
Masks sensitive content in a string using intelligent partial masking.
Options
interface MaskingOptions {
maskChar?: string; // Character used for masking (default: '*')
preserveLength?: boolean; // Preserve original length (default: false)
excludePatterns?: string[]; // Patterns to exclude from masking
onlyPatterns?: string[]; // Only mask these patterns
matchAccuracy?: 'high' | 'medium' | 'low'; // Detection sensitivity
}Returns
interface MaskResult {
output: string; // Masked string
found: { [name: string]: number }; // Count of each pattern found
matches: string[]; // Original matched values
masked: string[]; // Masked versions of matches
}hasSensitiveContent(input: string, options?): boolean
Quickly check if a string contains sensitive data without performing masking.
import { hasSensitiveContent } from 'sensitive-data-masker';
hasSensitiveContent('user@example.com'); // true
hasSensitiveContent('hello world'); // false
// With options
hasSensitiveContent('sk-1234567890abcdef', {
matchAccuracy: 'high',
excludePatterns: ['genericId']
}); // truegetPatternMatches(input: string, options?): PatternMatch[]
Get detailed information about all pattern matches including their positions.
import { getPatternMatches } from 'sensitive-data-masker';
const matches = getPatternMatches('Contact: admin@test.com and key: sk-123abc');
console.log(matches);
// [
// {
// pattern: 'email',
// matches: [{ match: 'admin@test.com', startIndex: 9, endIndex: 22 }]
// },
// {
// pattern: 'openaiApiKey',
// matches: [{ match: 'sk-123abc', startIndex: 33, endIndex: 41 }]
// }
// ]Advanced Usage
Custom Masking Options
import { mask } from 'sensitive-data-masker';
// Custom masking character
const result = mask('API key: sk-1234567890abcdef', { maskChar: '#' });
console.log(result.output);
// "API key: ##-1234567890ab##"
// Preserve original length
const result2 = mask('secret123', { preserveLength: true });
console.log(result2.output);
// "*********" (full length masked)
// Use high accuracy mode (fewer false positives)
const result3 = mask('sk-1234567890abcdef', { matchAccuracy: 'high' });
console.log(result3.output);
// "##-1234567890ab##"Pattern Filtering
// Only mask specific patterns
const result = mask('Email: user@test.com, API: sk-123', {
onlyPatterns: ['email', 'openaiApiKey']
});
// Exclude certain patterns
const result2 = mask('Email: user@test.com, UUID: 123e4567-e89b-12d3-a456-426614174000', {
excludePatterns: ['uuid', 'genericId']
});
// Combine with accuracy control
const result3 = mask(sensitiveText, {
matchAccuracy: 'high',
excludePatterns: ['uuid']
});Supported Pattern Categories
The library detects sensitive data across 25 categories with 200+ patterns:
🆔 Personal Identifiable Information (PII)
- Email addresses (multiple formats)
- Phone numbers (US, International, E.164)
- Social Security Numbers (US with various formats)
- Driver's license numbers, Medical record numbers
- Tax IDs (TIN/EIN), Canadian SIN, UK National Insurance Numbers
☁️ Cloud Provider Credentials
- AWS: Access keys, secret keys, session tokens, account IDs
- AWS Resources: EC2, S3, RDS, Lambda ARNs, VPC IDs
- Azure: Subscription IDs, client secrets, resource IDs
- Google Cloud: API keys, service account keys, project IDs
💳 Financial & Payment Services
- Credit card numbers (Visa, MasterCard, Amex, Discover)
- Stripe: Secret keys, publishable keys, webhook secrets
- PayPal: Access tokens, client IDs
- Square: Access tokens, application IDs
- Bank account numbers (US routing numbers, IBAN)
🤖 AI Provider Credentials
- OpenAI: API keys, organization IDs
- Anthropic/Claude: API keys
- Google AI: Gemini API keys, Vertex AI tokens
- Hugging Face: Access tokens, API keys
- Other AI: Groq, Perplexity, Replicate, Together AI
🔐 Authentication & Security
- JWT tokens, Bearer tokens
- OAuth access tokens, refresh tokens
- API keys in headers (
X-API-Key,Authorization) - Session IDs, CSRF tokens
- Generic secret patterns in environment variables
🔧 Developer Tools & Services
- GitHub: Personal access tokens, app tokens
- Slack: Bot tokens, webhook URLs, app secrets
- Discord: Bot tokens, webhook URLs
- Analytics: Google Analytics, Mixpanel, Amplitude
- Monitoring: Datadog, New Relic, Sentry keys
🗄️ Database & Storage
- Database connection strings (PostgreSQL, MySQL, MongoDB)
- File Storage: S3 bucket URLs, Azure Blob Storage
- CDN: CloudFront URLs, Azure CDN
- Redis connection strings, Elasticsearch URLs
🔑 Cryptographic Materials
- RSA private keys, SSH private keys
- EC private keys, DSA private keys
- X.509 certificates, PGP private key blocks
- JSON Web Keys (JWK), PKCS#8 keys
🌐 Network & Location
- IPv4/IPv6 addresses, MAC addresses
- Geographic coordinates (latitude/longitude)
- Private network ranges, subnet masks
- URL patterns with embedded secrets
📱 Communication Services
- Messaging: Twilio, SendGrid, Mailgun keys
- Social Media: Twitter, Facebook, Instagram tokens
- Email Services: Mailchimp, Postmark, SparkPost
- SMS/Voice: Nexmo, Plivo, MessageBird
🛠️ Infrastructure & DevOps
- Container Registries: Docker Hub, ECR, GCR tokens
- CI/CD: Jenkins, GitLab CI, CircleCI tokens
- Deployment: Vercel, Netlify, Heroku tokens
- Monitoring: PagerDuty, Datadog, New Relic
🏢 Enterprise & Business
- CRM: Salesforce, HubSpot tokens
- E-commerce: Shopify, WooCommerce keys
- Business Tools: Slack, Microsoft Teams tokens
- Analytics: Google Analytics, Adobe Analytics
🎯 Generic Patterns
- UUID v4, Generic IDs
- Base64 encoded secrets
- Hex-encoded keys (32, 64, 128 bit)
- Custom secret patterns in configuration files
🔍 URL & Reference Patterns
- URLs with embedded tokens
- Database connection URIs
- API endpoints with keys
- Webhook URLs with secrets
💾 Version Control & Code
- Git repository URLs with tokens
- Package manager tokens (npm, PyPI)
- Container registry credentials
- Code hosting platform tokens
Pattern Accuracy Levels
Control detection sensitivity to balance between security and false positives:
High Accuracy
- Most specific patterns with minimal false positives
- Examples: AWS access keys with
AKIAprefix, specific API key formats - Best for production environments
Medium Accuracy (Default)
- Balanced detection with reasonable false positive rates
- Examples: Generic API keys, common secret patterns
- Good for most use cases
Low Accuracy
- Broadest detection, may have higher false positive rates
- Examples: Generic IDs, loose pattern matching
- Useful for comprehensive scanning
// Use high accuracy for production
const prodResult = mask(text, { matchAccuracy: 'high' });
// Use medium accuracy for development
const devResult = mask(text, { matchAccuracy: 'medium' });
// Use low accuracy for comprehensive scanning
const scanResult = mask(text, { matchAccuracy: 'low' });TypeScript Support
Full TypeScript support with complete type definitions:
import { mask, hasSensitiveContent, getPatternMatches } from 'sensitive-data-masker';
import type { MaskResult, MaskingOptions } from 'sensitive-data-masker';
// Type-safe masking options
const options: MaskingOptions = {
maskChar: '#',
matchAccuracy: 'high',
excludePatterns: ['uuid']
};
const result: MaskResult = mask(text, options);Real-World Examples
Log File Sanitization
import { mask } from 'sensitive-data-masker';
const logEntry = `
[2024-01-15 10:30:45] INFO User john@company.com logged in
[2024-01-15 10:31:12] DEBUG API call with key sk-1234567890abcdef
[2024-01-15 10:31:15] ERROR Payment failed for card 4111-1111-1111-1111
[2024-01-15 10:31:20] WARN SSN in request: 123-45-6789
`;
const sanitized = mask(logEntry);
console.log(sanitized.output);
// [2024-01-15 10:30:45] INFO User **hn@company.c** logged in
// [2024-01-15 10:31:12] DEBUG API call with key **-1234567890ab**
// [2024-01-15 10:31:15] ERROR Payment failed for card **11-1111-1111-11**
// [2024-01-15 10:31:20] WARN SSN in request: **3-45-67**
console.log(sanitized.found);
// { email: 1, openaiApiKey: 1, creditCard: 1, ssn: 1 }Configuration File Security
const config = `
DATABASE_URL=postgresql://user:password123@localhost:5432/db
OPENAI_API_KEY=sk-1234567890abcdef1234567890abcdef
STRIPE_SECRET_KEY=sk_live_abcdef123456
ADMIN_EMAIL=admin@company.com
JWT_SECRET=super-secret-key-123
`;
const result = mask(config);
console.log(result.output);
// DATABASE_URL=postgresql://user:**ssword1** @localhost:5432/db
// OPENAI_API_KEY=**-1234567890abcdef1234567890ab**
// STRIPE_SECRET_KEY=**_live_abcdef12**
// ADMIN_EMAIL=**min@company.c**
// JWT_SECRET=**per-secret-key-1**Multi-Environment Setup
import { mask } from 'sensitive-data-masker';
// Production: Mask everything with high accuracy
const prodResult = mask(sensitiveData, { matchAccuracy: 'high' });
// Development: Allow test emails but mask real API keys
const devResult = mask(sensitiveData, {
matchAccuracy: 'medium',
excludePatterns: ['email']
});
// Testing: Only mask financial data
const testResult = mask(sensitiveData, {
onlyPatterns: ['creditCard', 'bankAccount', 'ssn'],
matchAccuracy: 'high'
});Data Pipeline Processing
import { hasSensitiveContent, mask } from 'sensitive-data-masker';
// Check if data needs processing
function processBatch(records: string[]) {
const results = records.map(record => {
if (hasSensitiveContent(record)) {
const masked = mask(record, { matchAccuracy: 'high' });
return {
data: masked.output,
hadSensitiveData: true,
patternsFound: Object.keys(masked.found)
};
}
return { data: record, hadSensitiveData: false };
});
return results;
}Performance Considerations
- Optimized Regex Engine: Patterns are compiled and cached on first use
- Single-Pass Processing: Efficient string traversal with minimal overhead
- Memory Efficient: No unnecessary string copies or allocations
- Pattern Filtering: Use
onlyPatternswhen you know which types to look for - Accuracy Optimization: Higher accuracy modes are faster due to more specific patterns
// Optimize for specific use cases
const emailsOnly = mask(text, { onlyPatterns: ['email'] }); // Faster
const highAccuracy = mask(text, { matchAccuracy: 'high' }); // Faster, fewer false positives
const comprehensive = mask(text, { matchAccuracy: 'low' }); // Slower, more thoroughSecurity Best Practices
- Always mask before logging: Ensure sensitive data is masked before writing to logs
- Use appropriate accuracy: Higher accuracy for production, lower for development/testing
- Store results securely: The
matchesarray contains original sensitive values - Regular updates: Keep the library updated for new pattern definitions
- Test your patterns: Verify masking works correctly with your specific data formats
- Environment-specific config: Use different settings for dev/staging/production
Development
Prerequisites
- Node.js >= 18.12.0
- Yarn or npm
Setup
git clone https://github.com/bgauryy/sensitive-data-mask.git
cd sensitive-data-mask
yarn installCommands
yarn build # Build the library
yarn dev # Build in watch mode
yarn lint # Run ESLint
yarn test # Run tests
yarn typecheck # Run TypeScript compiler checksContributing
Contributions are welcome! Please feel free to submit a Pull Request.
Adding New Patterns
- Choose the appropriate category file in
src/regexes/ - Add your pattern following the existing structure:
{
name: 'myPattern',
regex: /your-regex-here/gi,
description: 'Description of what this detects',
matchAccuracy: 'medium' // optional: 'high', 'medium', or 'low'
}- Run tests to ensure no regressions
- Submit a PR with a clear description
License
MIT © guybary
Security
If you discover a security vulnerability, please email guybary@wix.com instead of using the issue tracker.
Made with ❤️ for developers who care about data security