Skip to content

Commit 8479ce7

Browse files
kckernclaude
andcommitted
docs: add YAML refactoring design document
Design for migrating from legacy MJS data files to YAML-based multi-canon architecture with build-time compilation. Key decisions: - Build-time YAML → JS compilation (no runtime parsing) - Canon priority: explicit → LDS default → auto-detect fallback - Lazy load COC mapping (881KB) only when convertCanon() called - Generic convertCanon() API instead of canon-specific functions - Return canon field only when auto-detected to different canon Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1 parent 65e8bf8 commit 8479ce7

File tree

1 file changed

+233
-0
lines changed

1 file changed

+233
-0
lines changed
Lines changed: 233 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,233 @@
1+
# YAML Data Refactoring Design
2+
3+
**Date:** 2026-01-12
4+
**Status:** Approved
5+
**Goal:** Refactor data loading from legacy MJS files to YAML-based multi-canon architecture
6+
7+
## Background
8+
9+
The legacy data files (`scriptdata.mjs`, `scriptlang.mjs`, `scriptregex.mjs`, `coc.mjs`, `coc-mapping.mjs`) have been archived and replaced with a YAML-based structure under `data/canons/` and `data/shared/`. The source code still imports the archived files and needs refactoring.
10+
11+
## Decisions
12+
13+
| Decision | Choice |
14+
|----------|--------|
15+
| Deployment target | NPM package only |
16+
| YAML handling | Build-time compilation to JS |
17+
| Canon priority | Explicit → LDS default → auto-detect fallback |
18+
| COC mapping loading | Lazy load on first use |
19+
| Canon conversion API | Explicit `convertCanon()` function only |
20+
| Result canon field | Only include when different from expected |
21+
22+
## Build Pipeline
23+
24+
### YAML Compilation Step
25+
26+
New script `build/compile-yaml.mjs` compiles YAML to JS modules:
27+
28+
```
29+
data/ src/data/ (compiled, gitignored)
30+
├── canons/ ├── canons/
31+
│ ├── bible/ │ ├── bible/
32+
│ │ ├── _structure.yml → │ │ ├── structure.mjs
33+
│ │ └── en.yml → │ │ └── en.mjs
34+
│ ├── lds/ │ ├── lds/
35+
│ │ ├── _structure.yml → │ │ ├── structure.mjs
36+
│ │ └── en.yml → │ │ └── en.mjs
37+
│ └── coc/ │ └── coc/
38+
│ ├── _structure.yml → │ ├── structure.mjs
39+
│ └── en.yml → │ └── en.mjs
40+
└── shared/ └── shared/
41+
├── en.yml → ├── en.mjs
42+
└── ko.yml → └── ko.mjs
43+
```
44+
45+
Additionally, `_archive/data/coc-mapping.mjs` is compiled to `src/data/canons/coc/mapping.mjs`.
46+
47+
### Build Command
48+
49+
```bash
50+
npm run build
51+
# 1. compile-yaml.mjs: YAML → JS in src/data/
52+
# 2. build.mjs: Bundle to dist/
53+
```
54+
55+
### Gitignore Addition
56+
57+
```
58+
src/data/
59+
```
60+
61+
## Data Loading Architecture
62+
63+
### New Module: `src/lib/data-loader.mjs`
64+
65+
```javascript
66+
// Eagerly loaded (small, always needed)
67+
import bibleStructure from '../data/canons/bible/structure.mjs';
68+
import ldsStructure from '../data/canons/lds/structure.mjs';
69+
import sharedEn from '../data/shared/en.mjs';
70+
71+
// Lazy loaded caches
72+
let cocStructure = null;
73+
let cocMapping = null;
74+
let languageCache = {};
75+
76+
export function getCanonStructure(canon) {
77+
// Returns structure, lazy-loads COC if needed
78+
// Merges parent structure if canon extends another
79+
}
80+
81+
export function getLanguageData(canon, lang) {
82+
// Returns merged: shared/{lang} + canons/{canon}/{lang}
83+
// Uses deep-merge utility
84+
}
85+
86+
export async function getCocMapping() {
87+
// Lazy loads 881KB mapping only when convertCanon() is called
88+
if (!cocMapping) {
89+
const module = await import('../data/canons/coc/mapping.mjs');
90+
cocMapping = module.default;
91+
}
92+
return cocMapping;
93+
}
94+
```
95+
96+
### Loading Strategy
97+
98+
| Data | Loading | Reason |
99+
|------|---------|--------|
100+
| Bible structure | Eager | Always needed (LDS extends it) |
101+
| LDS structure | Eager | Default canon |
102+
| COC structure | Lazy | Only for COC references |
103+
| COC mapping | Lazy | Only for `convertCanon()` |
104+
| Language data | Lazy + cached | Load per language on demand |
105+
106+
## API Changes
107+
108+
### Existing Functions (Backward Compatible)
109+
110+
Signatures unchanged:
111+
- `lookupReference(query, language?, config?)`
112+
- `generateReference(verseIds, language?, config?)`
113+
- `detectReferences(text, language?, callback?)`
114+
- `setLanguage(lang)` / `getLanguage()`
115+
116+
### New Config Option
117+
118+
```javascript
119+
// Explicit canon
120+
lookupReference("1 Nephi 1:1", "en", { canon: "lds" })
121+
122+
// Default (LDS)
123+
lookupReference("1 Nephi 1:1", "en")
124+
125+
// Auto-detect fallback (verse 150 doesn't exist in LDS 1 Nephi 3)
126+
lookupReference("1 Nephi 3:150", "en")
127+
// → { ref: "1 Nephi 3:150", verse_ids: [...], canon: "coc" }
128+
```
129+
130+
### New Functions
131+
132+
```javascript
133+
// Convert between canons
134+
convertCanon(verseIds, { from: 'coc', to: 'lds' })
135+
// Returns: { verse_ids: number[], partial: boolean }
136+
137+
// Set/get default canon
138+
setCanon('lds')
139+
getCanon()
140+
```
141+
142+
### Result Canon Field
143+
144+
Only include `canon` in result when it differs from expected:
145+
146+
```javascript
147+
// Using default (LDS), found in LDS → no canon field
148+
lookupReference("John 3:16")
149+
// → { ref: "John 3:16", verse_ids: [26136] }
150+
151+
// Using default (LDS), auto-detected COC → canon field included
152+
lookupReference("1 Nephi 3:150")
153+
// → { ref: "1 Nephi 3:150", verse_ids: [...], canon: "coc" }
154+
155+
// Explicit COC, found in COC → no canon field
156+
lookupReference("1 Nephi 3:150", "en", { canon: "coc" })
157+
// → { ref: "1 Nephi 3:150", verse_ids: [...] }
158+
```
159+
160+
## File Structure
161+
162+
```
163+
src/
164+
├── scriptures.mjs # Main entry (updated imports)
165+
├── canon-converter.mjs # Renamed from scriptcanon.mjs
166+
├── lib/
167+
│ ├── data-loader.mjs # NEW: lazy loading, caching, merging
168+
│ ├── deep-merge.mjs # KEEP: merges shared + canon data
169+
│ └── options-resolver.mjs # KEEP: resolves language/canon options
170+
├── data/ # Compiled JS (gitignored)
171+
│ ├── canons/
172+
│ │ ├── bible/
173+
│ │ │ ├── structure.mjs
174+
│ │ │ └── en.mjs
175+
│ │ ├── lds/
176+
│ │ │ ├── structure.mjs
177+
│ │ │ └── en.mjs
178+
│ │ └── coc/
179+
│ │ ├── structure.mjs
180+
│ │ ├── en.mjs
181+
│ │ └── mapping.mjs
182+
│ └── shared/
183+
│ ├── en.mjs
184+
│ └── ko.mjs
185+
186+
build/
187+
├── build.mjs # Existing bundler
188+
└── compile-yaml.mjs # NEW: YAML → JS compiler
189+
190+
data/ # Source YAML (committed)
191+
├── canons/
192+
│ ├── bible/
193+
│ │ ├── _structure.yml
194+
│ │ └── en.yml
195+
│ ├── lds/
196+
│ │ ├── _structure.yml
197+
│ │ └── en.yml
198+
│ └── coc/
199+
│ ├── _structure.yml
200+
│ └── en.yml
201+
└── shared/
202+
├── en.yml
203+
└── ko.yml
204+
```
205+
206+
## Files to Delete
207+
208+
After refactoring is complete, remove from `src/lib/`:
209+
- `yaml-loader.mjs` (only needed by build script, move to `build/`)
210+
- `canon-loader.mjs` (replaced by `data-loader.mjs`)
211+
212+
## Implementation Tasks
213+
214+
1. Create `build/compile-yaml.mjs` script
215+
2. Create `src/lib/data-loader.mjs` module
216+
3. Update `src/scriptures.mjs` to use data-loader
217+
4. Rename `scriptcanon.mjs` to `canon-converter.mjs`, make generic
218+
5. Add `canon` option to lookup/generate/detect functions
219+
6. Add `setCanon()`/`getCanon()` functions
220+
7. Implement auto-detect fallback logic
221+
8. Update `build/build.mjs` to run compile-yaml first
222+
9. Add `src/data/` to `.gitignore`
223+
10. Update tests for new canon functionality
224+
11. Move `yaml-loader.mjs` to `build/` directory
225+
12. Remove `canon-loader.mjs`
226+
227+
## Testing Strategy
228+
229+
- All existing tests must pass (backward compatibility)
230+
- Add tests for explicit canon selection
231+
- Add tests for auto-detect fallback
232+
- Add tests for `convertCanon()` with lazy loading
233+
- Add tests for `setCanon()`/`getCanon()`

0 commit comments

Comments
 (0)