Clean up comments for readability

bean1352 · bean1352 · commit 674bd85cc3b9 · 2026-07-02T11:26:19.000+02:00
diff --git a/modules/module-mongodb-storage/src/storage/implementation/MongoSyncBucketStorage.ts b/modules/module-mongodb-storage/src/storage/implementation/MongoSyncBucketStorage.ts
@@ -634,14 +634,12 @@ export abstract class MongoSyncBucketStorage
   /**
    * How many operations to sample when estimating a bucket's row count.
    *
-   * {@link storage.estimateDistinctRows} recovers the true row count from how often the sample lands on the
-   * same row twice ("collisions"). A bucket with `R` rows produces collisions only once the sample size
-   * approaches `sqrt(R)`, and needs roughly `sqrt(100 * R)` before they carry a usable signal. `R` is unknown
-   * up front but is bounded by the operation count, so sampling `sqrt(200 * operations)` operations yields on
-   * the order of 100 expected collisions even in the worst case of one row per operation - enough to keep the
-   * estimate stable rather than swinging with sampling noise. Clamped to [MIN, MAX] to bound per-bucket cost;
-   * above the MAX-implied width the estimate degrades gracefully (only for buckets both very wide and barely
-   * fragmented, which are not the fragmented offenders the report exists to surface).
+   * {@link storage.estimateDistinctRows} infers the row count from how often the sample lands on the same
+   * row twice, so the sample must be large enough to contain such repeats. Sampling `sqrt(200 * operations)`
+   * operations yields on the order of 100 expected repeats even in the worst case of one row per operation,
+   * which keeps the estimate stable instead of swinging with sampling noise. The clamp bounds per-bucket
+   * cost; past the cap only very wide, barely fragmented buckets lose accuracy, and those are not the
+   * offenders the report exists to surface.
    */
   protected bucketRowSampleTarget(operations: number): number {
     const target = Math.ceil(Math.sqrt(200 * operations));
diff --git a/packages/service-core/src/storage/bucket-report.ts b/packages/service-core/src/storage/bucket-report.ts
@@ -95,16 +95,17 @@ export function resolveBucketReportLimit(limit?: number): number {
 }
 
 /**
- * Estimate the true distinct row count of a bucket from a sample of its operations.
+ * Estimate the true distinct row count of a bucket from a random sample of its operations.
  *
- * Each operation is included in the sample with probability `r = sampledOps / operations`, so a row with
- * `k` operations is seen with probability `1 - (1 - r)^k`. Assuming operations are spread roughly evenly
- * across rows (so each of `R` rows has about `operations / R` of them), the expected number of distinct
- * rows in the sample is `R * (1 - (1 - r)^(operations / R))`. This is monotonic in `R`, so we binary-search
- * for the `R` that matches the observed distinct count.
+ * The signal is repetition: a sample that keeps landing on the same rows means few rows, while a sample
+ * where every operation lands on a new row means many. Formally, each operation is included in the sample
+ * with probability `r = sampledOps / operations`, so a row with `k` operations appears with probability
+ * `1 - (1 - r)^k`. Assuming operations are spread roughly evenly across `R` rows (`k = operations / R`),
+ * the expected number of distinct rows in the sample is `R * (1 - (1 - r)^(operations / R))`. That grows
+ * with `R`, so a binary search finds the `R` matching the observed distinct count.
  *
- * The naive `distinctRows / r` over-counts rows (and so under-states fragmentation) whenever the sample
- * already covered most rows - exactly the highly-fragmented buckets the report exists to surface.
+ * The naive `distinctRows / r` ignores repetition and over-counts rows (under-stating fragmentation) on
+ * exactly the highly fragmented buckets the report exists to surface.
  *
  * Pure (no I/O) so it is unit-testable; storage adapters supply the sampled counts.
  */