JSPM

  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 37
  • Score
    100M100P100Q50947F

Portable Programmatic Evaluation Framework - Claim-driven, deterministic evaluation for experiments

Package Exports

  • ppef
  • ppef/aggregation
  • ppef/aggregation/__tests__/aggregators.unit.test
  • ppef/aggregation/__tests__/pipeline.unit.test
  • ppef/aggregation/aggregators
  • ppef/aggregation/index
  • ppef/aggregation/pipeline
  • ppef/claims
  • ppef/claims/index
  • ppef/collector
  • ppef/collector/__tests__/result-collector.unit.test
  • ppef/collector/__tests__/schema.unit.test
  • ppef/collector/index
  • ppef/collector/result-collector
  • ppef/collector/schema
  • ppef/executor
  • ppef/executor/__tests__/binary-sut.unit.test
  • ppef/executor/__tests__/checkpoint-hash-bug.diagnostic.test
  • ppef/executor/__tests__/checkpoint-manager.integration.test
  • ppef/executor/__tests__/checkpoint-manager.unit.test
  • ppef/executor/__tests__/checkpoint-merge-bug.diagnostic.test
  • ppef/executor/__tests__/checkpoint-merge-bug.unit.test
  • ppef/executor/__tests__/checkpoint-storage.unit.test
  • ppef/executor/__tests__/executor.unit.test
  • ppef/executor/__tests__/memory-monitor.unit.test
  • ppef/executor/__tests__/parallel-checkpoint-merge.integration.test
  • ppef/executor/__tests__/parallel-executor.integration.test
  • ppef/executor/__tests__/parallel-executor.unit.test
  • ppef/executor/__tests__/resource-calculator.unit.test
  • ppef/executor/__tests__/run-id.unit.test
  • ppef/executor/__tests__/worker-entry.integration.test
  • ppef/executor/__tests__/worker-entry.unit.test
  • ppef/executor/__tests__/worker-threads-executor.unit.test
  • ppef/executor/binary-sut
  • ppef/executor/checkpoint-manager
  • ppef/executor/checkpoint-storage
  • ppef/executor/checkpoint-types
  • ppef/executor/executor
  • ppef/executor/index
  • ppef/executor/memory-monitor
  • ppef/executor/parallel-executor
  • ppef/executor/resource-calculator
  • ppef/executor/run-id
  • ppef/executor/worker-entry
  • ppef/executor/worker-executor
  • ppef/executor/worker-threads-executor
  • ppef/registry
  • ppef/registry/case-registry
  • ppef/registry/index
  • ppef/registry/sut-registry
  • ppef/renderers
  • ppef/renderers/index
  • ppef/renderers/latex-renderer
  • ppef/renderers/types
  • ppef/robustness
  • ppef/robustness/__tests__/perturbations.unit.test
  • ppef/robustness/index
  • ppef/robustness/perturbations
  • ppef/statistical
  • ppef/types
  • ppef/types/aggregate
  • ppef/types/case
  • ppef/types/claims
  • ppef/types/evaluator
  • ppef/types/index
  • ppef/types/perturbation
  • ppef/types/result
  • ppef/types/sut

Readme

PPEF - Portable Programmatic Evaluation Framework

A claim-driven, deterministic evaluation framework for experiments. PPEF provides a structured approach to testing and validating software components through reusable test cases, statistical aggregation, and claim-based evaluation.

Published npm package with dual ESM/CJS output. Single runtime dependency: commander.

Features

  • Type-safe: Strict TypeScript with generic SUT, Case, and Evaluator abstractions
  • Registry: Centralized registries for Systems Under Test (SUTs) and evaluation cases with role/tag filtering
  • Execution: Deterministic execution with worker threads, checkpointing, memory monitoring, and binary SUT support
  • Statistical: Mann-Whitney U test, Cohen's d, confidence intervals
  • Aggregation: Summary stats, pairwise comparisons, and rankings across runs
  • Evaluation: Four built-in evaluators — claims, robustness, metrics, and exploratory
  • Rendering: LaTeX table generation for thesis integration
  • CLI: Five commands for running, validating, planning, aggregating, and evaluating experiments

Installation

# Install as a dependency
pnpm add ppef

# Or use locally for development
git clone https://github.com/Mearman/ppef.git
cd ppef
pnpm install
pnpm build

Development

pnpm install              # Install dependencies
pnpm build                # TypeScript compile + CJS wrapper generation
pnpm typecheck            # Type-check only (tsc --noEmit)
pnpm lint                 # ESLint + Prettier with auto-fix
pnpm test                 # Run all tests with coverage (c8 + tsx + Node native test runner)

Run a single test file:

npx tsx --test src/path/to/file.test.ts

CLI (after build):

ppef run          # Execute experiments
ppef validate     # Validate configuration
ppef plan         # Dry-run execution plan
ppef aggregate    # Post-process results
ppef evaluate     # Run evaluators on results

Architecture

Data Flow Pipeline

SUTs + Cases (Registries)
    → Executor (runs SUTs against cases, deterministic runIds)
    → EvaluationResult (canonical schema)
    → ResultCollector (validates + filters)
    → Aggregation Pipeline (summary stats, comparisons, rankings)
    → Evaluators (claims, robustness, metrics, exploratory)
    → Renderers (LaTeX tables for thesis)

Module Map (src/)

Module Purpose
types/ All canonical type definitions (result, sut, case, claims, evaluator, aggregate, perturbation)
registry/ SUTRegistry and CaseRegistry — generic registries with role/tag filtering
executor/ Orchestrator with worker threads, checkpointing, memory monitoring, binary SUT support
collector/ Result aggregation and JSON schema validation
statistical/ Mann-Whitney U test, Cohen's d, confidence intervals
aggregation/ computeSummaryStats(), computeComparison(), computeRankings(), pipeline
evaluators/ Four built-in evaluators + extensible registry (see below)
claims/ Claim type definitions
robustness/ Perturbation configs and robustness metric types
renderers/ LaTeX table renderer
cli/ Five commands with config loading, module loading, output writing

Key Abstractions

SUT (SUT<TInputs, TResult>): Generic System Under Test. Has id, config, and run(inputs). Roles: primary, baseline, oracle.

CaseDefinition (CaseDefinition<TInput, TInputs>): Two-phase resource factory — getInput() loads a resource once, getInputs() returns algorithm-specific inputs.

Evaluator (Evaluator<TConfig, TInput, TOutput>): Extensible evaluation with validateConfig(), evaluate(), summarize(). Four built-in types:

  • ClaimsEvaluator — tests explicit hypotheses with statistical significance
  • RobustnessEvaluator — sensitivity analysis under perturbations
  • MetricsEvaluator — multi-criterion threshold/baseline/target-range evaluation
  • ExploratoryEvaluator — hypothesis-free analysis (rankings, pairwise comparisons, correlations, case-class effects)

EvaluationResult: Canonical output schema capturing run identity (deterministic SHA-256 runId), correctness, metrics, output artefacts, and provenance.

Subpath Exports

Each module is independently importable:

import { SUTRegistry } from 'ppef/registry';
import { EvaluationResult } from 'ppef/types';
import { computeSummaryStats } from 'ppef/aggregation';

Available subpaths: ppef/types, ppef/registry, ppef/executor, ppef/collector, ppef/statistical, ppef/aggregation, ppef/evaluators, ppef/claims, ppef/robustness, ppef/renderers.

Conventions

  • TypeScript strict mode, ES2023 target, ES modules
  • Node.js native test runner (node:test + node:assert) — not Vitest/Jest
  • Coverage via c8 (text + html + json-summary in ./coverage/)
  • Conventional commits enforced via commitlint + husky
  • Semantic release from all branches
  • No any types — use unknown with type guards
  • Executor produces deterministic runId via SHA-256 hash of inputs

License

MIT