Skip to content

Latest commit

 

History

History
70 lines (55 loc) · 4.74 KB

File metadata and controls

70 lines (55 loc) · 4.74 KB

Benchmark methodology

How the benchmark results are produced. The benchmarks are intentionally conservative: no number is published until the harness emits result_kind: "actual_benchmark" with is_placeholder: false and claims_allowed: true. Reproduce any result locally with yarn benchmark (raw output: benchmarks/results/latest.actual.json).

Methodology card: benchmark proof without fake numbers — same scenario, same input size, warmup before timing, real versions, commit attached, and claims allowed only from actual benchmark output.

Competitor set

Engine Adapter key Role
Neuron-JS @sebasoft/neuron-js First-party rules engine under test.
json-rules-engine json-rules-engine Closest default Node.js JSON rules-engine competitor.
JsonLogic json-logic-js Portable JSON predicate format competitor.
Hand-coded TypeScript hand-coded-typescript Baseline for direct conditional logic without engine overhead.
rule-engine-js rule-engine-js Smaller modern competitor selected because it installs/builds in this repository.

Scenario matrix

Scenario Inputs represented Why it exists
pricing-discount tier, region, coupon, cart total, account age Shows business-rule pricing decisions and validation/explanation overhead.
eligibility-approval age, country, verification status, risk score, account flags Shows policy/approval style decisions with clear pass/fail outcomes.
workflow-routing channel, urgency, customer segment, confidence score, escalation flags Shows deterministic workflow routing and trace usefulness.

Input-size matrix

Profile Decisions Usage
smoke 100 Correctness and trace sanity.
small 1,000 Local development feedback.
medium 10,000 Chartable throughput.
large 100,000 Optional; run only if runtime remains practical in CI/local machines.

How each metric is measured

  • Fairness gate. Before timing, every engine must reproduce the scenario's canonical decision (e.g. pricing finalTotal: 105, discountAmount: 20); the run aborts on any mismatch, so all engines are timed doing equivalent work.
  • Throughput / p50 / p95. Warmup iterations run untimed, then measured iterations run in batches; per-decision latency is averaged per batch (so per-call timer overhead does not dominate sub-microsecond engines). Throughput is total measured decisions over total measured seconds.
  • Cold start. Median wall-clock across several fresh Node processes to import the engine and execute the first decision; the timer starts before the import, so Node's own startup is excluded.
  • Bundle size. esbuild bundles and minifies the engine's public surface (ESM, node platform); the output byte length is recorded. The hand-coded baseline has no library dependency (0).
  • Validation / explanation overhead. Neuron-JS only: the per-decision latency delta of running validateScript (resp. explainExecution) around an otherwise identical execution. The other engines provide no equivalent step, so their measured delta is 0.

Result fields

Each row in the results file carries these fields (units and definitions are mirrored in the machine-readable result schema):

Field Unit Meaning
engine identifier Engine under test (fixed enum).
scenario identifier Scenario slug.
input_size profile Named workload profile (smoke/small/medium/large).
warmup_iterations decisions Unmeasured warmup decisions before timing.
measured_iterations decisions Measured decisions in the timing window.
throughput_decisions_per_second decisions/second Measured decisions ÷ elapsed measured seconds.
p50_ms / p95_ms milliseconds Median / 95th-percentile per-decision latency.
cold_start_ms milliseconds Import + first decision in a fresh process.
bundle_size_minified_bytes bytes Minified bundle of the engine's public surface.
validation_overhead_ms milliseconds Validation-enabled vs disabled per-decision delta (Neuron-JS).
explanation_overhead_ms milliseconds Trace-enabled vs disabled per-decision delta (Neuron-JS).
node_version / package_version / commit_sha provenance Run environment and source state.

No fabricated numbers

Numbers are published only from a measured actual_benchmark run; placeholder fixtures are never used for public claims. Results reflect one machine, Node version, and commit — reproduce them before citing, and avoid "fastest"/"best" framing beyond what a named scenario and input size support.