Skip to content

Commit 31877f0

Browse files
committed
Update README.md
1 parent 438fdc8 commit 31877f0

File tree

1 file changed

+11
-45
lines changed

1 file changed

+11
-45
lines changed

examples/k_module_problem/README.md

Lines changed: 11 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -52,51 +52,6 @@ Generation 2 (crossover):
5252

5353
**Key insight**: Evolution discovers correct modules in different individuals and **crossover combines them**. This is the "Building Block Hypothesis" - complex solutions are assembled from simpler discovered components.
5454

55-
## Theoretical Analysis
56-
57-
| Method | Expected Evaluations | Why |
58-
|--------|---------------------|-----|
59-
| **Random Search** | ~312 (50% of space) | Pure luck |
60-
| **Pass@100 (LLM)** | ~100 calls, ~15% success | Independent samples, no learning |
61-
| **Iterative Refinement** | ~312+ | No gradient, random walk |
62-
| **Evolution (pop=20)** | ~40-60 | Parallel exploration + crossover |
63-
64-
The gap widens exponentially with more modules:
65-
- K=5 modules: Iterative ~1,562, Evolution ~70
66-
- K=6 modules: Iterative ~7,812, Evolution ~90
67-
68-
### Note on Pass@k with Closed Models
69-
70-
The pass@k metric (probability of finding solution in k independent attempts) is commonly used to evaluate LLM capabilities. However:
71-
72-
- **Open models** (local): Can generate k responses in parallel with `n=k` parameter
73-
- **Closed models** (API): Most don't support `n>1`, requiring k separate API calls
74-
75-
For this comparison, we include a **random baseline** that simulates pass@k without an LLM. This establishes the "no learning" baseline.
76-
77-
### Random Baseline Results (100 trials, 100 samples each)
78-
79-
| Metric | Value |
80-
|--------|-------|
81-
| **Success rate (pass@100)** | 16% (16/100 trials found solution) |
82-
| **Avg samples to solution** | 43.3 (when found) |
83-
| **Min samples** | 5 (lucky guess) |
84-
| **Max samples** | 91 |
85-
86-
**Pass@k breakdown:**
87-
88-
| k | Empirical | Theoretical |
89-
|---|-----------|-------------|
90-
| 1 | 0% | 0.2% |
91-
| 10 | 1% | 1.6% |
92-
| 20 | 4% | 3.2% |
93-
| 50 | 9% | 7.7% |
94-
| 100 | 16% | 14.8% |
95-
96-
The empirical results closely match the theoretical prediction `pass@k ≈ 1 - (624/625)^k`.
97-
98-
Any method that beats this baseline is demonstrating actual optimization, not just random sampling.
99-
10055
## Running the Experiment
10156

10257
### Prerequisites
@@ -159,6 +114,17 @@ This generates:
159114

160115
## Experimental Results
161116

117+
### Random Baseline (100 trials, 100 samples each)
118+
119+
| Metric | Value |
120+
|--------|-------|
121+
| **Success rate (pass@100)** | 16% (16/100 trials found solution) |
122+
| **Avg samples to solution** | 43.3 (when found) |
123+
| **Min samples** | 5 (lucky guess) |
124+
| **Max samples** | 91 |
125+
126+
This establishes the "no learning" baseline. Any method that beats this is demonstrating actual optimization, not just random sampling.
127+
162128
### Iterative Refinement Results (3 trials, 100 iterations max)
163129

164130
| Trial | Iterations | Result | Best Score |

0 commit comments

Comments
 (0)