You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"publish docs", "regenerate notebooks", "update dev note", "add API
8
-
reference", any request that touches `fern/`.
7
+
"publish docs", "regenerate notebooks", "update dev note", any request
8
+
that touches `fern/`.
9
9
---
10
10
11
11
# Data Designer Docs Maintenance
@@ -16,7 +16,7 @@ Current URL: **`datadesigner.docs.buildwithfern.com/nemo/datadesigner`** (see `i
16
16
17
17
## Scope Rule
18
18
19
-
**ALL doc edits happen under `fern/`.** The legacy `docs/` directory is the original MkDocs source. `docs/notebook_source/*.py` remains canonical for notebook code, but **do not add new top-level prose pages under `docs/`**. Concept pages, recipes, plugins, code reference, and Dev Notes prose live under `fern/versions/latest/pages/`.
19
+
**ALL doc edits happen under `fern/`.** The legacy `docs/` directory is the original MkDocs source. `docs/notebook_source/*.py` remains canonical for notebook code, but **do not add new top-level prose pages under `docs/`**. Concept pages, recipes, plugins, and Dev Notes prose live under `fern/versions/latest/pages/`.
20
20
21
21
## Versioning Model
22
22
@@ -39,7 +39,7 @@ For future Fern-native releases, do not copy page trees by hand on `main`. The r
├── code-reference/ ← gitignored; populated by `fern docs md generate`
60
59
└── versions/
61
60
├── latest.yml ← authoring navigation tree
62
61
└── latest/pages/ ← authoring MDX content
@@ -401,47 +400,6 @@ import notebook from "@/components/notebooks/1-the-basics";
401
400
402
401
The converter (`fern/scripts/ipynb-to-fern-json.py`) **auto-strips the leading Colab badge cell** — `<NotebookViewer>` renders its own banner from the `colabUrl` prop. Don't manually re-add it.
403
402
404
-
## Python API Reference (`libraries:`)
405
-
406
-
`docs.yml` keeps a Fern-native `libraries:` block for the config package. Local generation uses `py2fern` through `make generate-fern-api-reference` and writes multiple gitignored trees under `fern/code-reference/`:
407
-
408
-
- `data-designer/`for `data_designer.config`
409
-
- `interface/`for `data_designer.interface`
410
-
- `engine/seed-readers/`
411
-
- `engine/processors/`
412
-
- `engine/mcp/`
413
-
- `engine/column-generators/`
414
-
415
-
To populate locally:
416
-
417
-
```bash
418
-
make generate-fern-api-reference
419
-
```
420
-
421
-
This does not require Fern auth. Re-run when the upstream Python source changes. If you need to compare with Fern's native generator, use `make generate-fern-api-reference-native` with Fern auth.
422
-
423
-
The generated trees are wired into `versions/latest.yml` under `Code Reference`:
424
-
425
-
- `Config`contains prose pages plus `Config API` from `../code-reference/data-designer/data_designer/config`
426
-
- `Interface`contains prose pages plus `Interface API` from `../code-reference/interface/data_designer/interface`
427
-
- `Engine Extension API`contains prose pages plus the seed reader, processor, MCP runtime, and column generator API folders
428
-
429
-
There is no `Topic Overviews` section. Prose reference pages live beside the generated folders under `fern/versions/latest/pages/code_reference/`.
430
-
431
-
To add another generated package, update the `generate-fern-api-reference` target and add the matching `folder:` entry under the right `Code Reference` section. Only add a `libraries:` entry when Fern's native generator should know about that source:
Pyright needs a regular Python package (with `__init__.py`). The `data_designer` namespace itself is PEP 420 (no `__init__.py`), so always point at a sub-package one level deeper.
444
-
445
403
## MDX Gotchas (the ones that bit during migration)
`fern check`must pass before commit. The local broken-link checker has known false positives — it computes URLs from file paths instead of from slugified nav titles, so cross-section absolute links sometimes flag incorrectly. Spot-check by clicking through the dev server.
469
427
470
-
To generate the API reference for local preview:
471
-
472
-
```bash
473
-
make generate-fern-api-reference # py2fern; populates fern/code-reference/ (gitignored)
474
-
```
475
-
476
-
If the "Python API" sidebar folder is empty, you forgot this step.
5. Hooks `ProcessorRunner` for pre-batch and post-batch stages
37
37
38
-
`AsyncTaskScheduler` runs on a dedicated async loop with frontier-driven dispatch, semaphore-based capacity limits, salvage rounds for failed tasks, and order-dependent locks for columns that must execute sequentially. Ready frontier tasks are admitted through a virtual-time fair queue so one hot column or model-backed generator cannot consume the whole submission window before peer work gets a turn.
38
+
`AsyncTaskScheduler` runs on a dedicated async loop with frontier-driven dispatch, task-admission leases, salvage rounds for failed tasks, and order-dependent locks for columns that must execute sequentially. Ready frontier tasks enter `FairTaskQueue`, are selected through virtual-time ordering, and are committed only after `TaskAdmissionController` acquires the required scheduler resources.
Row-group admission is fixed by default in the dataset-builder path: the configured row-group concurrency is the hard in-flight cap. The scheduler also has an internal adaptive row-group mode for direct use that only raises a soft target up to that cap; it is additive ramp-up, not AIMD shrink/recovery behavior.
134
+
135
+
When request admission is available, async scheduling may use request-pressure snapshots as a read-only advisory during fair-queue selection. A request-pressured task can be skipped for an eligible peer without mutating request-admission state; provider/model/domain request limits remain owned by request admission.
136
+
132
137
## Design Decisions
133
138
134
139
-**Dual execution engines behind one API.** The sequential engine is simpler and easier to debug; the async engine adds row-group parallelism for throughput. Users switch via an environment variable without changing their code.
135
140
-**DAG-driven ordering** ensures columns with dependencies (e.g., a judge column that depends on a text column) are generated in the correct order, regardless of the order they appear in the config.
136
-
-**Fair async admission** keeps the scheduler flowing across ready columns and model groups. Global semaphores still bound memory/coroutine growth, while per-group virtual-time queues prevent a large ready frontier from degenerating into a column-by-column wave. LLM admission caps are peer-sensitive: a solo model group can fill available global capacity, but once another scheduling group has queued work the saturated group yields until peers get admission slots or admitted tasks complete.
141
+
-**Fair async admission with bounded borrow by default** keeps the scheduler flowing across ready columns and model groups. `FairTaskQueue.select_next(...)` chooses eligible ready work, `TaskAdmissionController` leases scheduler resources before spawn, and `FairTaskQueue.commit(...)` removes the selected task only after admission succeeds. The default `BoundedBorrowTaskAdmissionPolicyConfig` computes a strict per-group share, lets solo groups borrow only up to a capacity-derived reserve, and makes borrowed groups yield when eligible peer pressure appears. Passing `bounded_borrow=None` selects strict-fair admission for tests and benchmark comparisons. Per-group virtual-time ordering prevents a large ready frontier from degenerating into a column-by-column wave, and scheduler-resource accounting remains separate from provider/model request admission.
137
142
-**Salvage rounds in async mode** retry failed tasks after all other tasks in a round complete, improving resilience against transient LLM failures without blocking the entire generation.
138
143
-**Unified DAG construction.**`topologically_sort_column_configs` (in `execution_graph.py`) determines column ordering using Kahn's algorithm; the runtime `ExecutionGraph` adds strategy-aware dependency tracking for the async scheduler.
0 commit comments