Package Exports
- @himorishige/noren-dict-reloader
Readme
@himorishige/noren-dict-reloader
An extension package for the Noren PII masking library that provides functionality to dynamically load and periodically update (hot-reload) redaction policies and custom dictionaries from remote URLs.
What's New in v0.5.0
- π Performance Improvements: 30% code reduction for better performance and smaller bundle size
- β‘ Optimized Fetch Logic: Unified conditional GET with optional timeout support
- π§Ή Simplified API: Streamlined CompileOptions and reduced complexity
- π§ Enhanced Error Handling: Improved error messages and timeout management
- π¦ Memory Efficiency: Optimized Map management and reduced object allocation
Migration Notes
Some rarely-used CompileOptions have been removed in v0.5.0:
enableContextualConfidence- useenableConfidenceScoringinsteadcontextualSuppressionEnabled- functionality consolidated into core detectioncontextualBoostEnabled- functionality consolidated into core detectionallowDenyConfig.disableDefaults- simplified to boolean options only
Features
- Dynamic Configuration Loading: Loads policy and dictionary files over HTTP(S) and applies them to Noren's
Registry. - Efficient Update-Checking: Uses HTTP
ETagheaders for differential checks, reducing network traffic by only downloading files when they have changed. - Hot-Reloading: Periodically reloads configurations in the background to keep them up-to-date without application restarts.
- Flexible Retry Logic: If an update fails, it retries using exponential backoff and jitter to avoid overwhelming the server.
- Custom Compilation: Allows users to freely implement the logic for transforming loaded policies and dictionaries into a
Registry.
Installation
pnpm add @himorishige/noren-dict-reloader @himorishige/noren-coreBasic Usage
import { Registry } from '@himorishige/noren-core';
import { PolicyDictReloader } from '@himorishige/noren-dict-reloader';
// Define a compile function to transform the policy and dictionaries into a Registry
function compile(policy, dicts) {
const registry = new Registry(policy);
// Implement logic here to parse the contents of dicts,
// create custom detectors, and register them using registry.use().
console.log('Compiled with new policy and dictionaries.');
return registry;
}
// Initialize the reloader
const reloader = new PolicyDictReloader({
policyUrl: 'https://example.com/noren-policy.json',
dictManifestUrl: 'https://example.com/noren-manifest.json',
compile,
intervalMs: 60000, // Check for updates every 60 seconds
requestTimeoutMs: 10000, // v0.5.0: Request timeout in milliseconds
maxConcurrent: 5, // v0.5.0: Max concurrent dictionary downloads
onSwap: (newRegistry, changed) => {
console.log('Configuration updated. Changed files:', changed);
// Here, you would swap the application's Registry instance with the new one
},
onError: (error) => {
console.error('Failed to reload dictionary:', error);
},
});
// Start the hot-reloading process
await reloader.start();
// Get the initial compiled Registry instance to start using it
const initialRegistry = reloader.getCompiled();Dictionary files and manifest
The reloader expects a manifest JSON at dictManifestUrl, and one or more dictionary JSON files referenced by it.
- Manifest format:
{
"dicts": [
{ "id": "company", "url": "https://example.com/dicts/company-dict.json" }
]
}- Dictionary format (one file per logical group):
{
"entries": [
{
"pattern": "EMP\\d{5}",
"type": "employee_id",
"risk": "high",
"description": "Employee ID format: EMP followed by 5 digits"
}
]
}Notes:
pattern: JavaScript RegExp source string (no leading/trailing slashes). It will typically be compiled with flagsgu.type: a string PII type. Can be custom in addition to built-ins.risk: one oflow|medium|high.description: optional, for documentation only.
Templates:
- See
example/manifest.template.jsonandexample/dictionary.template.jsonin this package. - A more complete example is in
examples/dictionary-files/company-dict.jsonat the repo root.
Example: compile() that registers dictionary entries
Below is a minimal compile function that turns the loaded dictionaries into custom detectors and registers them with Registry.
import type { Detector, PiiType, Policy } from '@himorishige/noren-core'
import { Registry } from '@himorishige/noren-core'
type DictEntry = { pattern: string; type: string; risk: 'low' | 'medium' | 'high'; description?: string }
type DictFile = { entries?: DictEntry[] }
function compile(policy: unknown, dicts: unknown[]) {
const registry = new Registry((policy ?? {}) as Policy)
const detectors: Detector[] = []
for (const d of dicts) {
const entries = (d as DictFile).entries ?? []
for (const e of entries) {
if (!e?.pattern || !e?.type || !e?.risk) continue
let re: RegExp
try {
re = new RegExp(e.pattern, 'gu')
} catch {
continue
}
detectors.push({
id: `dict:${e.type}:${e.pattern}`,
priority: 100,
match: (u) => {
for (const m of u.src.matchAll(re)) {
if (m.index === undefined) continue
u.push({
type: e.type as PiiType,
start: m.index,
end: m.index + m[0].length,
value: m[0],
risk: e.risk,
})
}
},
})
}
}
// Optionally, provide custom maskers or additional context hints:
// registry.use(detectors, { employee_id: (h) => `EMP_***${h.value.slice(-4)}` }, ['η€Ύε‘ηͺε·', 'employee'])
registry.use(detectors)
return registry
}Local files and custom loaders
If you cannot host files over HTTP(S), you can override how files are loaded using the load option.
Quick start for local files on Node.js using file://:
import { PolicyDictReloader, fileLoader } from '@himorishige/noren-dict-reloader'
const reloader = new PolicyDictReloader({
policyUrl: 'file:///abs/path/to/policy.json',
dictManifestUrl: 'file:///abs/path/to/manifest.json',
compile,
load: fileLoader, // enables file:// support; HTTP(S) remains available
})
await reloader.start()Notes:
fileLoadercomputes ETag from the SHA-256 of file contents and uses file mtime as Last-Modified.- Non-
file://URLs are delegated to the built-in HTTP(S) loader. - You can supply your own loader as a
LoaderFnto fetch from custom stores. file://URLs must be absolute and cannot include query or hash. Invalid URLs throw an error.- File I/O errors include details (path and original error message) to aid debugging.
Restrict file access with baseDir
You can restrict which files can be read by the file loader using createFileLoader with a baseDir option. The loader resolves symlinks via realpath() and rejects paths outside baseDir.
import { PolicyDictReloader, createFileLoader } from '@himorishige/noren-dict-reloader'
const load = createFileLoader(undefined, { baseDir: '/app/config' })
const reloader = new PolicyDictReloader({
policyUrl: 'file:///app/config/policy.json',
dictManifestUrl: 'file:///app/config/manifest.json',
compile,
load,
})
await reloader.start()Security notes:
- Prefer setting
baseDirwhen usingfile://to mitigate path traversal via symlinks. - Queries and fragments on
file://URLs are rejected.
Cloudflare Workers examples (KV / R2)
You can implement a LoaderFn for Cloudflare Workers.
KV loader:
import type { LoaderFn } from '@himorishige/noren-dict-reloader'
async function sha256Hex(s: string): Promise<string> {
const d = await crypto.subtle.digest('SHA-256', new TextEncoder().encode(s))
return [...new Uint8Array(d)].map(b => b.toString(16).padStart(2, '0')).join('')
}
export const kvLoader = (kv: KVNamespace): LoaderFn => {
return async (url, prev) => {
const key = new URL(url).pathname.slice(1) // e.g. kv://manifest.json
const text = await kv.get(key, 'text')
if (text == null) throw new Error(`KV ${key} not found`)
const etag = `W/"sha256:${await sha256Hex(text)}"`
if (prev?.etag === etag) return { status: 304, meta: prev }
let json: unknown
try { json = JSON.parse(text) } catch { json = text }
return { status: 200, meta: { etag, text, json } }
}
}
// Usage
// new PolicyDictReloader({
// policyUrl: 'kv://policy.json',
// dictManifestUrl: 'kv://manifest.json',
// compile,
// load: kvLoader(env.MY_KV),
// })R2 loader:
import type { LoaderFn } from '@himorishige/noren-dict-reloader'
export const r2Loader = (bucket: R2Bucket): LoaderFn => {
return async (url, prev) => {
const key = new URL(url).pathname.slice(1)
const obj = await bucket.get(key)
if (!obj) throw new Error(`R2 ${key} not found`)
const etag = obj.etag
const lastModified = obj.uploaded?.toUTCString()
if (prev?.etag === etag) return { status: 304, meta: prev }
const text = await obj.text()
let json: unknown
try { json = JSON.parse(text) } catch { json = text }
return { status: 200, meta: { etag, lastModified, text, json } }
}
}Bundle-embedded (no remote fetch)
If you don't need hot reload, you can embed JSON at build time and call your compile() directly:
// import policy/dicts via bundler (or inline JSON)
import policy from './policy.json'
import dictA from './dictA.json'
import dictB from './dictB.json'
// using the compile() example above
const registry = compile(policy, [dictA, dictB])
// start using `registry` right awayTips
- Ensure your server sends
ETagorLast-Modifiedand proper CORS headers if used in browsers. onSwapreceives achangedlist that may include:policy,manifest,dict:<id>,dict-removed:<id>.forceReload()will bust caches by adding a_bustquery param when needed.