JSPM

@nlptools/tokenizer

0.0.2
  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • 0
  • Score
    100M100P100Q60194F
  • License MIT

Tokenization utilities - HuggingFace tokenizers wrapper for NLPTools

Package Exports

  • @nlptools/tokenizer
  • @nlptools/tokenizer/dist/index.mjs

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (@nlptools/tokenizer) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

@nlptools/tokenizer

npm version npm license Contributor Covenant

Tokenization utilities - HuggingFace tokenizers wrapper for NLPTools

This package provides convenient access to HuggingFace tokenization utilities through the NLPTools ecosystem. It includes fast, client-side tokenization for various LLM models and supports both browser and Node.js environments.

Installation

# Install with npm
npm install @nlptools/tokenizer

# Install with yarn
yarn add @nlptools/tokenizer

# Install with pnpm
pnpm add @nlptools/tokenizer

Usage

Basic Setup

import { Tokenizer } from "@nlptools/tokenizer";

Available Functions

  • Tokenizer - Main tokenizer class for encoding and decoding text
  • encode() - Convert text to token IDs and tokens
  • decode() - Convert token IDs back to text
  • tokenize() - Split text into token strings
  • ** AddedToken** - Custom token configuration class

Example Usage

import { Tokenizer } from "@nlptools/tokenizer";

// Load tokenizer from HuggingFace Hub
const modelId = "HuggingFaceTB/SmolLM3-3B";
const tokenizerJson = await fetch(
  `https://huggingface.co/${modelId}/resolve/main/tokenizer.json`,
).then((res) => res.json());
const tokenizerConfig = await fetch(
  `https://huggingface.co/${modelId}/resolve/main/tokenizer_config.json`,
).then((res) => res.json());

// Create tokenizer instance
const tokenizer = new Tokenizer(tokenizerJson, tokenizerConfig);

// Encode text
const encoded = tokenizer.encode("Hello World");
console.log(encoded.ids); // [9906, 4435]
console.log(encoded.tokens); // ['Hello', 'Δ World']
console.log(encoded.attention_mask); // [1, 1]

// Decode back to text
const decoded = tokenizer.decode(encoded.ids);
console.log(decoded); // 'Hello World'

// Get token count
const tokenCount = tokenizer.encode("This is a sentence.").ids.length;
console.log(`Token count: ${tokenCount}`);

Features

  • πŸš€ Fast & Lightweight: Zero-dependency implementation for client-side use
  • πŸ”§ Model Compatible: Works with HuggingFace model tokenizers
  • πŸ“± Cross-Platform: Supports both browser and Node.js environments
  • πŸ“¦ TypeScript First: Full type safety with comprehensive API
  • 🌐 HuggingFace Hub: Direct integration with model repositories

References

This package incorporates and builds upon the following excellent open source projects:

License

MIT © Demo Macro