JSPM

search-text-normalizer

1.0.1
    • ESM via JSPM
    • ES Module Entrypoint
    • Export Map
    • Keywords
    • License
    • Repository URL
    • TypeScript Types
    • README
    • Created
    • Published
    • Downloads 20
    • Score
      100M100P100Q41045F
    • License MIT

    Multilingual text normalization utilities for search accuracy

    Package Exports

    • search-text-normalizer
    • search-text-normalizer/index.js

    This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (search-text-normalizer) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

    Readme

    Search Text Normalizer

    Text normalization utilities for different languages to improve search accuracy by normalizing text variations.

    Installation

    npm install search-text-normalizer

    Usage

    const { normalizeText } = require('search-text-normalizer');
    
    // Generic normalization
    normalizeText('Hello, World!'); // "hello world"
    
    // Language-specific normalization
    normalizeText('مَرْحَباً بِالعَالَم', 'ar'); // "مرحبا بالعالم" (Arabic)
    normalizeText('שָׁלוֹם עוֹלָם', 'he'); // "שלומ עולמ" (Hebrew)
    normalizeText('Γεῖα σου κόσμε', 'el'); // "γεια σου κοσμε" (Greek)
    normalizeText('Café résumé', 'la'); // "cafe resume" (Latin)
    normalizeText('ܫܠܵܡܵܐ ܥܵܠܡܵܐ', 'sy'); // "ܫܠܡܐ ܥܠܡܐ" (Syriac)

    TypeScript

    import { normalizeText, SupportedLanguage } from 'search-text-normalizer';
    
    const text: string = normalizeText('Hello, World!');
    const arabicText: string = normalizeText('مرحبا', 'ar' as SupportedLanguage);

    Supported Languages

    • ar - Arabic (removes diacritics, normalizes character variants)
    • he - Hebrew (removes vowel points, normalizes final letters)
    • el - Greek (removes accents, normalizes sigma)
    • la - Latin (removes diacritics from accented characters)
    • sy - Syriac (removes vowel points and punctuation marks)

    Features

    • Removes diacritics and vowel points
    • Normalizes character variants and punctuation
    • Converts to lowercase and trims whitespace
    • TypeScript support with full type definitions
    • Zero dependencies for production use
    • Comprehensive test suite with Jest

    Development

    npm run build         # Generate TypeScript declarations
    npm test              # Run tests
    npm run test:coverage # Run with coverage
    npm run test:watch    # Watch mode

    License

    MIT