Package Exports
- ppef
- ppef/aggregation
- ppef/aggregation/__tests__/aggregators.unit.test
- ppef/aggregation/__tests__/pipeline.unit.test
- ppef/aggregation/aggregators
- ppef/aggregation/index
- ppef/aggregation/pipeline
- ppef/claims
- ppef/claims/index
- ppef/collector
- ppef/collector/__tests__/result-collector.unit.test
- ppef/collector/__tests__/schema.unit.test
- ppef/collector/index
- ppef/collector/result-collector
- ppef/collector/schema
- ppef/executor
- ppef/executor/__tests__/binary-sut.unit.test
- ppef/executor/__tests__/checkpoint-hash-bug.diagnostic.test
- ppef/executor/__tests__/checkpoint-manager.integration.test
- ppef/executor/__tests__/checkpoint-manager.unit.test
- ppef/executor/__tests__/checkpoint-merge-bug.diagnostic.test
- ppef/executor/__tests__/checkpoint-merge-bug.unit.test
- ppef/executor/__tests__/checkpoint-storage.unit.test
- ppef/executor/__tests__/executor.unit.test
- ppef/executor/__tests__/memory-monitor.unit.test
- ppef/executor/__tests__/parallel-checkpoint-merge.integration.test
- ppef/executor/__tests__/parallel-executor.integration.test
- ppef/executor/__tests__/parallel-executor.unit.test
- ppef/executor/__tests__/resource-calculator.unit.test
- ppef/executor/__tests__/run-id.unit.test
- ppef/executor/__tests__/worker-entry.integration.test
- ppef/executor/__tests__/worker-entry.unit.test
- ppef/executor/__tests__/worker-threads-executor.unit.test
- ppef/executor/binary-sut
- ppef/executor/checkpoint-manager
- ppef/executor/checkpoint-storage
- ppef/executor/checkpoint-types
- ppef/executor/executor
- ppef/executor/index
- ppef/executor/memory-monitor
- ppef/executor/parallel-executor
- ppef/executor/resource-calculator
- ppef/executor/run-id
- ppef/executor/worker-entry
- ppef/executor/worker-executor
- ppef/executor/worker-threads-executor
- ppef/registry
- ppef/registry/case-registry
- ppef/registry/index
- ppef/registry/sut-registry
- ppef/renderers
- ppef/renderers/index
- ppef/renderers/latex-renderer
- ppef/renderers/types
- ppef/robustness
- ppef/robustness/__tests__/perturbations.unit.test
- ppef/robustness/index
- ppef/robustness/perturbations
- ppef/statistical
- ppef/types
- ppef/types/aggregate
- ppef/types/case
- ppef/types/claims
- ppef/types/evaluator
- ppef/types/index
- ppef/types/perturbation
- ppef/types/result
- ppef/types/sut
Readme
PPEF - Portable Programmatic Evaluation Framework
A claim-driven, deterministic evaluation framework for experiments. PPEF provides a structured approach to testing and validating software components through reusable test cases, statistical aggregation, and claim-based evaluation.
Published npm package with dual ESM/CJS output. Single runtime dependency: commander.
Features
- Type-safe: Strict TypeScript with generic SUT, Case, and Evaluator abstractions
- Registry: Centralized registries for Systems Under Test (SUTs) and evaluation cases with role/tag filtering
- Execution: Deterministic execution with worker threads, checkpointing, memory monitoring, and binary SUT support
- Statistical: Mann-Whitney U test, Cohen's d, confidence intervals
- Aggregation: Summary stats, pairwise comparisons, and rankings across runs
- Evaluation: Four built-in evaluators — claims, robustness, metrics, and exploratory
- Rendering: LaTeX table generation for thesis integration
- CLI: Five commands for running, validating, planning, aggregating, and evaluating experiments
Installation
# Install as a dependency
pnpm add ppef
# Or use locally for development
git clone https://github.com/Mearman/ppef.git
cd ppef
pnpm install
pnpm buildDevelopment
pnpm install # Install dependencies
pnpm build # TypeScript compile + CJS wrapper generation
pnpm typecheck # Type-check only (tsc --noEmit)
pnpm lint # ESLint + Prettier with auto-fix
pnpm test # Run all tests with coverage (c8 + tsx + Node native test runner)Run a single test file:
npx tsx --test src/path/to/file.test.tsCLI (after build):
ppef run # Execute experiments
ppef validate # Validate configuration
ppef plan # Dry-run execution plan
ppef aggregate # Post-process results
ppef evaluate # Run evaluators on resultsArchitecture
Data Flow Pipeline
SUTs + Cases (Registries)
→ Executor (runs SUTs against cases, deterministic runIds)
→ EvaluationResult (canonical schema)
→ ResultCollector (validates + filters)
→ Aggregation Pipeline (summary stats, comparisons, rankings)
→ Evaluators (claims, robustness, metrics, exploratory)
→ Renderers (LaTeX tables for thesis)Module Map (src/)
| Module | Purpose |
|---|---|
types/ |
All canonical type definitions (result, sut, case, claims, evaluator, aggregate, perturbation) |
registry/ |
SUTRegistry and CaseRegistry — generic registries with role/tag filtering |
executor/ |
Orchestrator with worker threads, checkpointing, memory monitoring, binary SUT support |
collector/ |
Result aggregation and JSON schema validation |
statistical/ |
Mann-Whitney U test, Cohen's d, confidence intervals |
aggregation/ |
computeSummaryStats(), computeComparison(), computeRankings(), pipeline |
evaluators/ |
Four built-in evaluators + extensible registry (see below) |
claims/ |
Claim type definitions |
robustness/ |
Perturbation configs and robustness metric types |
renderers/ |
LaTeX table renderer |
cli/ |
Five commands with config loading, module loading, output writing |
Key Abstractions
SUT (SUT<TInputs, TResult>): Generic System Under Test. Has id, config, and run(inputs). Roles: primary, baseline, oracle.
CaseDefinition (CaseDefinition<TInput, TInputs>): Two-phase resource factory — getInput() loads a resource once, getInputs() returns algorithm-specific inputs.
Evaluator (Evaluator<TConfig, TInput, TOutput>): Extensible evaluation with validateConfig(), evaluate(), summarize(). Four built-in types:
- ClaimsEvaluator — tests explicit hypotheses with statistical significance
- RobustnessEvaluator — sensitivity analysis under perturbations
- MetricsEvaluator — multi-criterion threshold/baseline/target-range evaluation
- ExploratoryEvaluator — hypothesis-free analysis (rankings, pairwise comparisons, correlations, case-class effects)
EvaluationResult: Canonical output schema capturing run identity (deterministic SHA-256 runId), correctness, metrics, output artefacts, and provenance.
Subpath Exports
Each module is independently importable:
import { SUTRegistry } from 'ppef/registry';
import { EvaluationResult } from 'ppef/types';
import { computeSummaryStats } from 'ppef/aggregation';Available subpaths: ppef/types, ppef/registry, ppef/executor, ppef/collector, ppef/statistical, ppef/aggregation, ppef/evaluators, ppef/claims, ppef/robustness, ppef/renderers.
Conventions
- TypeScript strict mode, ES2023 target, ES modules
- Node.js native test runner (
node:test+node:assert) — not Vitest/Jest - Coverage via c8 (text + html + json-summary in
./coverage/) - Conventional commits enforced via commitlint + husky
- Semantic release from all branches
- No
anytypes — useunknownwith type guards - Executor produces deterministic
runIdvia SHA-256 hash of inputs
License
MIT