Refresh v4.4.3 benchmarks with clean post-polish numbers; flag stale READMEtable

joshfactorial · claude · joshfactorial · commit a1ebd4da1e11 · 2026-05-18T20:08:50.000-05:00
Re-ran ecoli SE+PE 4-thread 10x on a quiet machine after all the polish work
landed. The ChangeLog table now reflects the actual v4.4.3 final numbers:

  ecoli_se wall: 14:55 -&gt; 1:35  (9.4x faster, was 7.6x with pre-polish numbers)
  ecoli_pe wall: 14:46 -&gt; 1:35  (9.4x faster, was 8.5x)
  ecoli_se CPU:  3,227 s -&gt; 331 s  (9.7x less)
  ecoli_pe CPU:  3,168 s -&gt; 338 s  (9.4x less)

vs NEAT 2.1 single-threaded baseline:
  SE: 7.9x faster, 56% less CPU
  PE: 12.8x faster, 72% less CPU

Also added a note at the top of README "Estimated runtimes" pointing readers
to ChangeLog v4.4.3 for current numbers — the v4.4.0 runtimes in that
section are now ~5-10x off from current performance.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/ChangeLog.md b/ChangeLog.md
@@ -7,19 +7,19 @@ chunk size now auto-tuning by default).
 
 | Metric                  | v4.4.2     | v4.4.3     | Improvement     |
 |-------------------------|------------|------------|-----------------|
-| ecoli SE wall time      | 14:55      | 1:57       | 7.6× faster     |
-| ecoli PE wall time      | 14:46      | 1:44       | 8.5× faster     |
-| ecoli SE total CPU      | 3,227 s    | 356 s      | 9.1× less       |
-| ecoli PE total CPU      | 3,168 s    | 368 s      | 8.6× less       |
+| ecoli SE wall time      | 14:55      | 1:35       | 9.4× faster     |
+| ecoli PE wall time      | 14:46      | 1:35       | 9.4× faster     |
+| ecoli SE total CPU      | 3,227 s    | 331 s      | 9.7× less       |
+| ecoli PE total CPU      | 3,168 s    | 338 s      | 9.4× less       |
 | Peak resident memory    | 549 MB     | 175 MB     | 3.1× less       |
 | Peak heap (memray)      | 1.27 GB    | 0.32 GB    | 4× less         |
 | Per-worker memory       | O(N×cov)   | O(1)       | bounded         |
 | `pysam.sort` calls      | 2          | 0          | gone            |
 | BAM correctness         | 0.06% dups | strict     | fixed           |
 
 **Versus NEAT 2.1 (single-threaded baseline):**
-- SE: 12:28 → 1:57 (6.4× faster, 53% less CPU)
-- PE: 20:12 → 1:44 (11.6× faster, 70% less CPU)
+- SE: 12:28 → 1:35 (7.9× faster, 56% less CPU)
+- PE: 20:12 → 1:35 (12.8× faster, 72% less CPU)
 
 **Scale-test (c_elegans 10× coverage, 4 threads, 100 Mb genome — ~7× the
 ecoli reference):**
diff --git a/README.md b/README.md
@@ -268,6 +268,8 @@ seqkit shuffle -s 42 reads_r2.fastq.gz -o reads_r2.shuffled.fastq.gz
 
 ### Estimated runtimes
 
+> **Note:** The tables below are from the original NEAT 4.4 (v4.4.0) benchmark. NEAT v4.4.3 is roughly **9× faster on multi-threaded ecoli** and uses **~3× less memory** thanks to the performance work landed in that release. The relative shape of the tables (size scaling, contig vs. size mode tradeoffs) remains accurate, but absolute runtimes should be divided by ~5–10× for v4.4.3+. See ChangeLog v4.4.3 for a detailed before/after table on ecoli and a c_elegans scale-test.
+
 To give users a sense of how long `neat read-simulator` runs may take, we benchmarked NEAT 4.4 on several reference genomes. All runs were paired-end, with read length of 150 bp, coverage of 10, fragment mean of 300 bp, and fragment standard deviation of 50 bp. Runtimes are reported as the average across three unique runs (`Avg. time (ms)`) and the corresponding runtime in minutes. Cells marked with N/A indicate that NEAT was not able to run to completion.
 
 Benchmarks were run on a System76 Meerkat with a 13th Gen Intel Core i3-1315U (8 logical cores, up to 4.50 GHz) and 16 GiB RAM, using a 512 GB SSD and Ubuntu 24.04.3 LTS (Linux kernel 6.14). Actual runtimes will vary depending on your hardware.