Skip to content

Commit 33633ac

Browse files
committed
wordsearch prototype
1 parent 4e045a1 commit 33633ac

28 files changed

Lines changed: 1408 additions & 1373 deletions

experiments/wordsearch/README.md

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ This project flips that around. The filler is generated to **mimic the
2222
statistical texture of natural language** along all eight reading directions.
2323
When the noise itself looks word-like, the target words blend in, and the
2424
puzzle becomes meaningfully harder — not because of grid size, but because of
25-
*camouflage*.
25+
_camouflage_.
2626

2727
---
2828

@@ -49,7 +49,7 @@ largely determined by how well the filler hides the targets.
4949
### Markov Models
5050

5151
A **Markov model** predicts the next item in a sequence based on the recent
52-
history. For text, we model it at the *character* level: given the previous
52+
history. For text, we model it at the _character_ level: given the previous
5353
few characters, what is the probability distribution over the next character?
5454

5555
Training is simply counting. We slide a window over a reference text and
@@ -117,10 +117,10 @@ while empty cells remain:
117117

118118
The key insight is the **combination step**. A single cell is read as part of
119119
potentially several different lines (one per direction), so its letter should
120-
be plausible for *all* of them at once. We support several combiners:
120+
be plausible for _all_ of them at once. We support several combiners:
121121

122122
| Combiner | Behaviour |
123-
|-----------|----------------------------------------------------------------------------------|
123+
| --------- | -------------------------------------------------------------------------------- |
124124
| `product` | Multiply probabilities — AND-like; favours letters that all directions agree on. |
125125
| `sum` | Average the distributions — OR-like; more permissive. |
126126
| `max` | Take the single strongest directional vote. |
@@ -140,11 +140,11 @@ them:
140140
### vs. Random / Frequency-Weighted Filler
141141

142142
The overwhelmingly common approach is to fill empty cells with **uniformly
143-
random** letters, or occasionally letters drawn from a language's *unigram*
143+
random** letters, or occasionally letters drawn from a language's _unigram_
144144
frequency table (so `E` and `T` appear more than `Q` and `Z`). Both ignore
145-
*sequence* structure entirely. We model the **conditional** distribution
146-
(order-N), so the filler exhibits realistic letter *transitions*, not just
147-
realistic letter *counts*.
145+
_sequence_ structure entirely. We model the **conditional** distribution
146+
(order-N), so the filler exhibits realistic letter _transitions_, not just
147+
realistic letter _counts_.
148148

149149
### vs. Single-Direction Text Generation
150150

@@ -153,13 +153,13 @@ generated in **one** direction (left to right). A wordsearch is read in
153153
**eight**. A naive Markov fill that only respects, say, the horizontal
154154
direction would still produce nonsense vertically and diagonally. Our
155155
**multi-directional combination** is the core difference: each cell is
156-
optimised to be plausible across *all* the lines it participates in
156+
optimised to be plausible across _all_ the lines it participates in
157157
simultaneously.
158158

159159
### vs. Dictionary / Constraint-Solver Filler
160160

161161
Some sophisticated generators (closer to crossword construction) try to make
162-
the filler spell *real* words in multiple directions using dictionaries and
162+
the filler spell _real_ words in multiple directions using dictionaries and
163163
constraint solvers. That is computationally expensive, often infeasible for
164164
dense grids, and produces a different feel (it leaks real words, which can be
165165
distracting). We deliberately aim for **plausible-but-not-real** texture:
@@ -172,12 +172,12 @@ Many generators make puzzles "harder" simply by enlarging the grid or adding
172172
more words. Our difficulty lever is **statistical camouflage**: by matching
173173
the noise to the language model, target words blend into their surroundings
174174
regardless of grid size. The Markov order and combiner give fine-grained,
175-
*qualitative* control over difficulty.
175+
_qualitative_ control over difficulty.
176176

177177
### Summary
178178

179179
| Approach | Sequence-aware | Multi-directional | Produces real words | Cost |
180-
|--------------------------------|:--------------:|:-----------------:|:-------------------:|:----------:|
180+
| ------------------------------ | :------------: | :---------------: | :-----------------: | :--------: |
181181
| Uniform random filler |||| Low |
182182
| Unigram-frequency filler |||| Low |
183183
| Single-direction Markov |||| Low |
@@ -189,7 +189,7 @@ regardless of grid size. The Markov order and combiner give fine-grained,
189189
## Configuration
190190

191191
| Option | Default | Notes |
192-
|-----------|------------|------------------------------------------------|
192+
| --------- | ---------- | ---------------------------------------------- |
193193
| grid size | 15 × 15 | Width × height of the lattice. |
194194
| order (N) | 3 | Markov context length. |
195195
| combiner | `product` | How directional predictions are merged. |
@@ -244,4 +244,4 @@ This is an experiment under active design. Milestones:
244244

245245
- Per-direction weighting of contributions.
246246
- Difficulty estimation heuristics.
247-
- Shareable puzzle URLs (encode grid + word list).
247+
- Shareable puzzle URLs (encode grid + word list).

experiments/wordsearch/idea.md

Lines changed: 26 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ backwards), making the hidden target words harder to spot.
1212
## 2. Goals & Non-Goals
1313

1414
### Goals
15+
1516
- Train an order-N Markov model from arbitrary reference text.
1617
- Place a user-supplied set of target words on the lattice.
1718
- Fill remaining cells using directional Markov predictions combined
@@ -20,14 +21,15 @@ backwards), making the hidden target words harder to spot.
2021
required).
2122

2223
### Non-Goals
24+
2325
- Server-side processing (everything runs client-side).
2426
- Solving / auto-finding words (generation only, for now).
2527
- Multi-language morphology (we operate on raw character sequences).
2628

2729
## 3. Definitions
2830

2931
| Term | Meaning |
30-
|-----------------|----------------------------------------------------------------|
32+
| --------------- | -------------------------------------------------------------- |
3133
| Lattice / Grid | 2D array of cells, each holding a single character. |
3234
| Direction | One of 8 unit vectors: N, S, E, W, NE, NW, SE, SW. |
3335
| Order (N) | Number of preceding characters used as Markov context. |
@@ -64,6 +66,7 @@ manifest.webmanifest
6466
## 5. Component Specifications
6567

6668
### 5.1 MarkovModel
69+
6770
- `train(text, order)`: builds nested frequency maps
6871
`context -> { char -> count }`.
6972
- `predict(context)`: returns a normalised `Map<char, prob>`.
@@ -72,43 +75,49 @@ manifest.webmanifest
7275
- Serialisable to/from JSON for caching.
7376

7477
### 5.2 Grid & Directions
78+
7579
- `Grid(width, height)` stores cells; `get/set(x, y)`, `inBounds(x, y)`.
7680
- `directions.js` exports the 8 vectors and a helper to read the
7781
context string of length ≤ N preceding a cell along a direction.
7882

7983
### 5.3 Placement
84+
8085
- Randomly position each target word along a random direction,
8186
rejecting placements that conflict with already-placed letters
8287
(unless overlapping letters match).
8388
- Mark placed cells as **locked** (never overwritten by the filler).
8489

8590
### 5.4 Fill Order (Adjacency)
91+
8692
- Maintain a priority queue of empty cells keyed by adjacency score.
8793
- Cells with more filled neighbours are filled first; ties broken
8894
randomly. Re-score neighbours after each fill.
8995

9096
### 5.5 Prediction Combination
97+
9198
For a target empty cell, for each of the 8 directions that has a
9299
defined (non-empty) preceding context, request a prediction. Then
93100
combine the resulting distributions using a configurable method:
94101

95-
| Combiner | Description |
96-
|------------|---------------------------------------------------------|
97-
| `product` | Multiply probabilities (logarithmic, AND-like). |
98-
| `sum` | Average / weighted sum (OR-like). |
99-
| `max` | Take the strongest single directional vote. |
100-
| `vote` | Each direction votes for its argmax; majority wins. |
102+
| Combiner | Description |
103+
| --------- | --------------------------------------------------- |
104+
| `product` | Multiply probabilities (logarithmic, AND-like). |
105+
| `sum` | Average / weighted sum (OR-like). |
106+
| `max` | Take the strongest single directional vote. |
107+
| `vote` | Each direction votes for its argmax; majority wins. |
101108

102109
The combined distribution is sampled (or argmax-selected per config)
103110
to choose the cell's letter.
104111

105112
### 5.6 UI
113+
106114
- Controls: grid size, Markov order, combiner method, sampling mode,
107115
reference text input/upload, target word list.
108116
- Render grid (highlight locked target letters in debug mode).
109117
- Regenerate / export (text, PNG) actions.
110118

111119
### 5.7 PWA
120+
112121
- `manifest.webmanifest` + service worker caching app shell for
113122
offline use. Installable on desktop/mobile.
114123

@@ -131,13 +140,13 @@ while empty cells remain:
131140

132141
## 7. Configuration Defaults
133142

134-
| Option | Default |
135-
|--------------|-------------|
136-
| grid size | 15 × 15 |
137-
| order (N) | 3 |
138-
| combiner | `product` |
139-
| sampling | `weighted` |
140-
| back-off | enabled |
143+
| Option | Default |
144+
| --------- | ---------- |
145+
| grid size | 15 × 15 |
146+
| order (N) | 3 |
147+
| combiner | `product` |
148+
| sampling | `weighted` |
149+
| back-off | enabled |
141150

142151
## 8. Implementation Plan (Milestones)
143152

@@ -151,11 +160,13 @@ while empty cells remain:
151160
7. **M7 — Polish**: presets, sample reference texts, accessibility.
152161

153162
## 9. Testing Strategy
163+
154164
- Unit tests for model back-off, combiner math, placement conflicts.
155165
- Property test: every locked target word is readable post-fill.
156166
- Visual/manual QA for generated grid plausibility.
157167

158168
## 10. Stretch Goals
169+
159170
- Per-direction weighting of contributions.
160171
- Difficulty estimation heuristics.
161-
- Shareable puzzle URLs (encode grid + word list).
172+
- Shareable puzzle URLs (encode grid + word list).

0 commit comments

Comments
 (0)