Skip to content

Commit 2bc4c49

Browse files
Add ADR 0004: Alignment training data preparation policy (Nemotron-informed) (#18)
Locks in the v0.3 alignment training data + LoRA + masking + evaluation policy before implementation begins. Pure documentation, no code change. What's new ---------- docs/adr/0004-alignment-training-data-preparation-policy.md Decision content (§2): 2.1 7-domain prompt pool composition (50k prompts) chat-en 30%, chat-zh 20%, code 15%, math 10%, long-context 10%, multi-turn 10%, tool-calls 5% + adversarial/OOD 1k held-out (no hard gate) 2.2 On-policy verifier rollout config: greedy, sink+window, multi-system-prompt rotation, block-aligned hidden state capture (block_size=4 deployment) 2.3 LoRA configuration: o_proj target, rank 128, alpha 512 (Nemotron-informed) with mandatory A/B/C validation before locking in 2.4 Loss formulation: 1.0 * smooth_L1 repr alignment + 0.5 * KL distill (T=2, top-20) + 0.1 * mask recovery + position-dependent masking schedule p_mask = 0.3 + 0.4 * (pos_in_block / block_size) 2.5 Per-verifier data isolation policy + versioning convention 2.6 Greedy-only training assumption (matches ADR 0001 §2.2) 2.7 Per-slice acceptance gates for v1 ship: aggregate >= 0.40 @ K=2 chat-en >= 0.45, chat-zh >= 0.40, code >= 0.25, math >= 0.30, long-context >= 0.30, multi-turn >= 0.35, tool-calls >= 0.40 2.8 Evaluation metrics: acceptance rate, TPF (tokens-per-forward), mean acceptance length, speedup vs vanilla AR Section 3 — Where Nemotron findings apply (and where they don't): 3.1 Adopt directly: o_proj-only LoRA, position-dependent masking, block-aligned capture, TPF + acceptance length metrics 3.2 Do not adopt: single-model self-spec architecture, joint AR- diffusion pretraining objective, custom CUDA kernels, 14B-scale absolute throughput numbers Section 4 — Five rejected alternatives with reasons. Section 5 — Consequences (positive + accepted trade-offs + implications for code). Section 6 — Validation criteria for ADR completion. Sources informing the ADR ------------------------- Fu et al. 2026, 'Nemotron-Labs-Diffusion: A Tri-Mode Language Model Unifying Autoregressive, Diffusion, and Self-Speculation Decoding' (NVIDIA technical report, May 2026). arXiv:2512.14067, 'Efficient-DLM: From Autoregressive to Diffusion Language Models, and Beyond in Speed' (Fu et al., Dec 2025). HuggingFace model card linear_spec_lora subfolder of nvidia/Nemotron-Labs-Diffusion-14B (LoRA target/rank/alpha confirmed). Why ADR 0004 ships now (not when v0.3 starts) --------------------------------------------- The data-prep choices materially constrain the trainer implementation. Writing the ADR after starting the trainer means either retrofitting decisions (waste) or churning the ADR mid-implementation (worse). Cost of writing this ADR pre-v0.3: ~3 hours of writing. Benefit: cleaner v0.3 PR sequence with locked-in inputs/outputs. Documentation updates --------------------- - docs/adr/README.md: ADR 0004 added to the index. - README.md: ADR badge updated to '0001 | 0002 | 0003 | 0004'; ADR list at the bottom gets a one-paragraph summary entry for 0004. No code change; full test suite still passes (451 passed, 100% coverage). Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
1 parent ae60d45 commit 2bc4c49

3 files changed

Lines changed: 427 additions & 3 deletions

File tree

README.md

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
[![CI](https://github.com/FluffyAIcode/Kakeya-LLM-Inference-engine/actions/workflows/ci.yaml/badge.svg?branch=main)](https://github.com/FluffyAIcode/Kakeya-LLM-Inference-engine/actions/workflows/ci.yaml)
44
[![Release](https://img.shields.io/badge/release-v0.1.0-blue)](https://github.com/FluffyAIcode/Kakeya-LLM-Inference-engine/releases/tag/v0.1.0)
55
[![Platform](https://img.shields.io/badge/platform-Apple%20Silicon-lightgrey)](docs/local-inference-engine.md)
6-
[![ADRs](https://img.shields.io/badge/ADRs-0001%20%7C%200002%20%7C%200003%20%7C%200006-green)](docs/adr/)
6+
[![ADRs](https://img.shields.io/badge/ADRs-0001%20%7C%200002%20%7C%200003%20%7C%200004%20%7C%200006-green)](docs/adr/)
77

88
Runs the speculative-decoding architecture designed in the prior product
99
discussion using **real, public** weights:
@@ -468,6 +468,15 @@ explicitly rejected.
468468
and what intermediate step ships in v0.2 — `PooledVerifier`
469469
wrapper that makes pool memory accounting accurate without
470470
touching the model forward.
471+
- [ADR 0004 — Alignment training data preparation policy
472+
(Nemotron-informed)](docs/adr/0004-alignment-training-data-preparation-policy.md):
473+
the v0.3 alignment training data + LoRA + masking + per-slice
474+
evaluation policy, locked in before implementation begins.
475+
Adopts NVIDIA Nemotron-Labs-Diffusion's `o_proj`-only LoRA
476+
configuration (rank 128, α 512) pending an A/B/C validation
477+
experiment, plus 7-domain prompt pool, block-aligned hidden
478+
state capture, position-dependent masking, and per-slice
479+
acceptance gates.
471480
- [ADR 0006 — Project positioning as local agent
472481
infrastructure](docs/adr/0006-local-agent-infrastructure-positioning.md):
473482
the strategic positioning decision that Kakeya is **local agent

0 commit comments

Comments
 (0)