|
| 1 | +--- |
| 2 | +title: "P1: Get Organized" |
| 3 | +status: inbox |
| 4 | +priority: P1 |
| 5 | +created: 2026-04-11 |
| 6 | +updated: 2026-04-11 |
| 7 | +author: "GitHub Copilot" |
| 8 | +goal: "Create a practical repo-organization plan for AI-DDTK that reduces sprawl first and adds semantic retrieval only where it materially improves discovery." |
| 9 | +--- |
| 10 | + |
| 11 | +<!-- TOC --> |
| 12 | + |
| 13 | +- [Phased Checklist (High-Level Progress)](#phased-checklist-high-level-progress) |
| 14 | +- [Overview](#overview) |
| 15 | +- [Goals](#goals) |
| 16 | +- [Non-Goals](#non-goals) |
| 17 | +- [Guiding Principle](#guiding-principle) |
| 18 | +- [Phase 0 — Inventory and Classification](#phase-0--inventory-and-classification) |
| 19 | +- [Phase 1 — Canonical Structure and Retention Rules](#phase-1--canonical-structure-and-retention-rules) |
| 20 | +- [Phase 2 — Build a Repo Metadata Catalog](#phase-2--build-a-repo-metadata-catalog) |
| 21 | +- [Phase 3 — Cleanup, Promotion, and Archival Pass](#phase-3--cleanup-promotion-and-archival-pass) |
| 22 | +- [Phase 4 — Add Hybrid Retrieval Where It Helps](#phase-4--add-hybrid-retrieval-where-it-helps) |
| 23 | +- [Phase 5 — Operating Rhythm and Ownership](#phase-5--operating-rhythm-and-ownership) |
| 24 | +- [Success Criteria](#success-criteria) |
| 25 | +- [Open Questions](#open-questions) |
| 26 | + |
| 27 | +<!-- /TOC --> |
| 28 | + |
| 29 | +--- |
| 30 | + |
| 31 | +## Phased Checklist (High-Level Progress) |
| 32 | + |
| 33 | +> This document should be updated as work is completed. Mark off items immediately rather than batching status updates later. |
| 34 | +
|
| 35 | +- [ ] **Phase 0 — Inventory and Classification** |
| 36 | +- [ ] **Phase 1 — Canonical Structure and Retention Rules** |
| 37 | +- [ ] **Phase 2 — Build a Repo Metadata Catalog** |
| 38 | +- [ ] **Phase 3 — Cleanup, Promotion, and Archival Pass** |
| 39 | +- [ ] **Phase 4 — Add Hybrid Retrieval Where It Helps** |
| 40 | +- [ ] **Phase 5 — Operating Rhythm and Ownership** |
| 41 | + |
| 42 | +## Overview |
| 43 | + |
| 44 | +AI-DDTK has grown into a toolkit repo with code, docs, recipes, project tracking, experiments, generated reports, and operational artifacts. The current problem is not only search. It is lifecycle clarity: what is canonical, what is in progress, what is generated, what is temporary, and what should be archived. |
| 45 | + |
| 46 | +Embeddings can help with discovery, clustering, and semantic lookup across notes, reports, and docs. They do not replace naming rules, retention rules, or folder discipline. This plan treats semantic retrieval as a later layer added on top of a cleaner repo model. |
| 47 | + |
| 48 | +## Goals |
| 49 | + |
| 50 | +- [ ] Reduce ambiguity about where new files belong. |
| 51 | +- [ ] Separate source-of-truth files from generated artifacts and temporary outputs. |
| 52 | +- [ ] Create a machine-readable catalog of important files and directories. |
| 53 | +- [ ] Make cleanup repeatable instead of one-off. |
| 54 | +- [ ] Add semantic retrieval only after the repo has usable metadata and folder hygiene. |
| 55 | + |
| 56 | +## Non-Goals |
| 57 | + |
| 58 | +- [ ] Do not redesign the entire repo structure in one pass. |
| 59 | +- [ ] Do not build a full knowledge platform before cleanup basics exist. |
| 60 | +- [ ] Do not index every file blindly into a vector store. |
| 61 | +- [ ] Do not treat generated artifacts as equal to canonical docs or source code. |
| 62 | + |
| 63 | +## Guiding Principle |
| 64 | + |
| 65 | +Organize first, index second. |
| 66 | + |
| 67 | +If the repo lacks clear file lifecycle rules, embeddings will help search the mess without reducing the mess. The right sequence is: |
| 68 | + |
| 69 | +1. Inventory the repo. |
| 70 | +2. Classify files by lifecycle and purpose. |
| 71 | +3. Clean up and normalize the highest-noise areas. |
| 72 | +4. Add metadata-backed discovery. |
| 73 | +5. Add hybrid lexical + semantic retrieval where it clearly improves workflow. |
| 74 | + |
| 75 | +## Phase 0 — Inventory and Classification |
| 76 | + |
| 77 | +Purpose: build a factual snapshot of what exists before making structural changes. |
| 78 | + |
| 79 | +### Checklist |
| 80 | + |
| 81 | +- [ ] Generate a repo-wide file inventory using tracked files as the baseline. |
| 82 | +- [ ] Break the inventory down by top-level area: `tools/`, `experimental/`, `PROJECT/`, `docs/`, `recipes/`, `templates/`, `test/`, `bin/`, `temp/`. |
| 83 | +- [ ] For each area, label files into one of these classes: `canonical-source`, `documentation`, `project-tracking`, `generated-artifact`, `temporary`, `experimental`, `archive-candidate`. |
| 84 | +- [ ] Identify the noisiest zones by count and churn, not by intuition. |
| 85 | +- [ ] Flag directories that mix multiple lifecycles in one place. |
| 86 | +- [ ] Produce a first-pass list of files that are probably duplicated, stale, or misfiled. |
| 87 | +- [ ] Record which generated outputs are intentionally checked in versus accidentally lingering. |
| 88 | + |
| 89 | +### Deliverable |
| 90 | + |
| 91 | +- [ ] A first-pass inventory snapshot stored in a machine-readable format such as JSON or CSV. |
| 92 | + |
| 93 | +### Exit Criteria |
| 94 | + |
| 95 | +- [ ] We can answer, with evidence, which parts of the repo are source, working notes, generated output, and probable cleanup targets. |
| 96 | + |
| 97 | +## Phase 1 — Canonical Structure and Retention Rules |
| 98 | + |
| 99 | +Purpose: define what belongs where and how long it should live. |
| 100 | + |
| 101 | +### Checklist |
| 102 | + |
| 103 | +- [ ] Define the canonical purpose of each top-level directory in one sentence. |
| 104 | +- [ ] Confirm `PROJECT/` is only for planning, tracking, inbox, working, and done states. |
| 105 | +- [ ] Confirm `temp/` is for sensitive and disposable runtime artifacts, not long-term reference docs. |
| 106 | +- [ ] Define what qualifies for `experimental/` and what conditions trigger promotion out of it. |
| 107 | +- [ ] Define which report outputs belong in-repo versus gitignored runtime storage. |
| 108 | +- [ ] Review `.gitignore` coverage for reports, screenshots, scans, auth state, and logs. |
| 109 | +- [ ] Write simple retention rules for generated content: keep, archive, or purge. |
| 110 | +- [ ] Decide what must always have an owning doc or README in dense directories. |
| 111 | + |
| 112 | +### Deliverable |
| 113 | + |
| 114 | +- [ ] A short policy section or reference doc update that names the lifecycle rules for canonical, experimental, generated, and temporary files. |
| 115 | + |
| 116 | +### Exit Criteria |
| 117 | + |
| 118 | +- [ ] A contributor can decide where a new file belongs without guessing. |
| 119 | + |
| 120 | +## Phase 2 — Build a Repo Metadata Catalog |
| 121 | + |
| 122 | +Purpose: create a lightweight system of record for discovery and cleanup. |
| 123 | + |
| 124 | +### Checklist |
| 125 | + |
| 126 | +- [ ] Define the metadata schema for the catalog. |
| 127 | +- [ ] Include at minimum: `path`, `area`, `file_type`, `lifecycle_class`, `owner_tool`, `canonical`, `generated`, `last_modified`, `status`, `notes`. |
| 128 | +- [ ] Add optional tags for themes such as `mcp`, `wpcc`, `playwright`, `local-wp`, `query-monitor`, `servers`, `project-doc`. |
| 129 | +- [ ] Generate the initial catalog automatically from the repo rather than maintaining it by hand. |
| 130 | +- [ ] Add a rule for how manual overrides are stored when auto-detection is wrong. |
| 131 | +- [ ] Mark high-value files explicitly as canonical references. |
| 132 | +- [ ] Mark low-value files explicitly as cleanup or archive candidates. |
| 133 | +- [ ] Decide where the catalog lives and whether it is checked in or regenerated. |
| 134 | + |
| 135 | +### Deliverable |
| 136 | + |
| 137 | +- [ ] A machine-readable catalog that can drive cleanup reports, folder summaries, and later search indexing. |
| 138 | + |
| 139 | +### Exit Criteria |
| 140 | + |
| 141 | +- [ ] We can query the repo by lifecycle and ownership instead of relying only on folder names. |
| 142 | + |
| 143 | +## Phase 3 — Cleanup, Promotion, and Archival Pass |
| 144 | + |
| 145 | +Purpose: reduce noise using the inventory and catalog rather than ad hoc decisions. |
| 146 | + |
| 147 | +### Checklist |
| 148 | + |
| 149 | +- [ ] Triage `PROJECT/1-INBOX` items into active, done, or misc states using existing doc rules. |
| 150 | +- [ ] Move finished project docs out of inbox. |
| 151 | +- [ ] Review `experimental/` for tools or docs that have effectively graduated. |
| 152 | +- [ ] Move obsolete or superseded planning docs to the appropriate archive location instead of leaving duplicates in place. |
| 153 | +- [ ] Consolidate duplicate instructions where one doc clearly supersedes another. |
| 154 | +- [ ] Remove or archive tracked generated artifacts that do not belong in the main repo surface. |
| 155 | +- [ ] Add missing README or index guidance in dense directories only where it reduces ambiguity. |
| 156 | +- [ ] Re-run the inventory after cleanup and measure count reduction and clearer classification. |
| 157 | + |
| 158 | +### Deliverable |
| 159 | + |
| 160 | +- [ ] A visibly smaller and more legible repo surface, especially in project-tracking and experimental areas. |
| 161 | + |
| 162 | +### Exit Criteria |
| 163 | + |
| 164 | +- [ ] The highest-noise folders have fewer ambiguous files and clearer ownership. |
| 165 | + |
| 166 | +## Phase 4 — Add Hybrid Retrieval Where It Helps |
| 167 | + |
| 168 | +Purpose: improve discovery after structure exists. |
| 169 | + |
| 170 | +### Checklist |
| 171 | + |
| 172 | +- [ ] Start with hybrid retrieval, not embeddings alone. |
| 173 | +- [ ] Use lexical search for exact names, paths, commands, headings, and schema keys. |
| 174 | +- [ ] Use embeddings for semantic discovery across docs, reports, changelog entries, project notes, and scan outputs. |
| 175 | +- [ ] Index docs and operational artifacts first. |
| 176 | +- [ ] Add code chunks only when documentation is insufficient for the target workflow. |
| 177 | +- [ ] Exclude binaries, screenshots, lockfiles, auth state, and highly repetitive output unless there is a clear use case. |
| 178 | +- [ ] Test real queries against the index before expanding scope. |
| 179 | +- [ ] Define success queries such as: “show me all files related to LocalWP auth failures” or “find similar past WPCC performance investigations.” |
| 180 | + |
| 181 | +### Deliverable |
| 182 | + |
| 183 | +- [ ] A narrow, high-signal retrieval layer aimed at discovery and clustering, not as a substitute for repo organization. |
| 184 | + |
| 185 | +### Exit Criteria |
| 186 | + |
| 187 | +- [ ] Semantic lookup answers real questions faster than plain grep without pulling in obvious noise. |
| 188 | + |
| 189 | +## Phase 5 — Operating Rhythm and Ownership |
| 190 | + |
| 191 | +Purpose: keep the repo organized after the first cleanup pass. |
| 192 | + |
| 193 | +### Checklist |
| 194 | + |
| 195 | +- [ ] Assign an owner or review rule for repo hygiene changes. |
| 196 | +- [ ] Add a recurring cleanup cadence for inbox, experimental, and generated-output areas. |
| 197 | +- [ ] Add a lightweight checklist for “new file acceptance” so artifacts do not accumulate silently. |
| 198 | +- [ ] Require new generated-output directories to declare whether they are checked in or gitignored. |
| 199 | +- [ ] Review the metadata catalog on a schedule rather than only during cleanup crises. |
| 200 | +- [ ] Add a simple report that shows new files by lifecycle class since the last review. |
| 201 | +- [ ] Revisit the hybrid index scope after one or two cleanup cycles. |
| 202 | + |
| 203 | +### Deliverable |
| 204 | + |
| 205 | +- [ ] A repeatable maintenance loop that prevents the repo from drifting back into ambiguity. |
| 206 | + |
| 207 | +### Exit Criteria |
| 208 | + |
| 209 | +- [ ] Repo organization becomes an operating habit, not a one-time project. |
| 210 | + |
| 211 | +## Success Criteria |
| 212 | + |
| 213 | +- [ ] The top-level repo areas each have a clear and enforced purpose. |
| 214 | +- [ ] New files can be classified quickly as canonical, generated, temporary, experimental, or project-tracking. |
| 215 | +- [ ] The noisiest folders have been reduced and normalized. |
| 216 | +- [ ] A metadata catalog exists and can be regenerated. |
| 217 | +- [ ] Semantic retrieval is scoped to the parts of the repo where it genuinely improves discovery. |
| 218 | +- [ ] Contributors can find the right file faster without memorizing tribal knowledge. |
| 219 | + |
| 220 | +## Open Questions |
| 221 | + |
| 222 | +- [ ] Should the metadata catalog live under `PROJECT/`, `tools/`, or a new repo-maintenance location? |
| 223 | +- [ ] Which generated artifacts are intentionally committed because they provide durable value? |
| 224 | +- [ ] Which `experimental/` items are actually production-grade and just waiting for promotion? |
| 225 | +- [ ] Should repo hygiene checks become part of `preflight.sh`, `post-flight.sh`, or a separate maintenance command? |
| 226 | +- [ ] Is the first retrieval target this repo alone, or this repo plus neighboring WordPress project artifacts and reports? |
0 commit comments