|
| 1 | +# Agent Reference Document — READ THIS FIRST |
| 2 | + |
| 3 | +This document exists because Brendan has had to repeat himself hundreds of times. |
| 4 | +If you are an AI agent working on this project, READ THIS BEFORE DOING ANYTHING. |
| 5 | + |
| 6 | +## What guIDE Is |
| 7 | + |
| 8 | +A desktop IDE where users load ANY local GGUF model and it just works. Chat, tool calling, |
| 9 | +browsing, code generation — powered by whatever model the user chose. Model-agnostic. |
| 10 | +The app adapts at runtime via dynamic model profiles. |
| 11 | + |
| 12 | +## What Success Means |
| 13 | + |
| 14 | +- User loads any model. Asks it to do something. It does it coherently to the best of |
| 15 | + that model's actual ability. |
| 16 | +- If a model produces good output in LM Studio, it must produce equally good or better |
| 17 | + output in guIDE. The pipeline helps, never hinders. |
| 18 | +- Works out of the box. No hand-tuning per model. |
| 19 | + |
| 20 | +## What Success Does NOT Mean |
| 21 | + |
| 22 | +- Tailoring code to specific model names |
| 23 | +- Benchmarking one model and declaring victory |
| 24 | +- Guardrails/quality gates/kill switches that prevent models from working |
| 25 | +- Timeouts that mask underlying problems (timeouts = failure) |
| 26 | + |
| 27 | +## Dynamic Model Profiles ARE the Correct Architecture |
| 28 | + |
| 29 | +The profile system (family + size tier) IS the right approach. Different size models |
| 30 | +genuinely need different parameters. A 0.6B model needs different sampling than a 30B. |
| 31 | +This is NOT "hand-tuning per model" — it's per-family-per-size-tier configuration, |
| 32 | +which scales. The profile system is NOT a fallback — it IS the runtime. |
| 33 | + |
| 34 | +Unknown models get sensible defaults derived from the closest matching tier. |
| 35 | + |
| 36 | +## Model Capabilities — Do NOT Underestimate |
| 37 | + |
| 38 | +- 0.6B models: CAN make tool calls, CAN chain a couple of them. They hallucinate |
| 39 | + and repeat themselves but they ARE capable. Don't restrict them to single calls |
| 40 | + without testing first. They've proven they can do it. |
| 41 | +- 1-4B models: Should handle multi-step tasks reliably. |
| 42 | +- 4B+: Should handle complex chains. |
| 43 | +- ALL models must produce COHERENT output. Even if smaller ones do less, they must |
| 44 | + not produce gibberish. |
| 45 | + |
| 46 | +## How to Work With Brendan |
| 47 | + |
| 48 | +### DO: |
| 49 | +- Test before implementing. Prove a problem exists before fixing it. |
| 50 | +- When shown a failing interaction, analyze what ACTUALLY happened. |
| 51 | +- If something works, leave it alone. "Looks good" is a valid answer. |
| 52 | +- Say "Brendan you're wrong" or "there's nothing else to do" when that's the truth. |
| 53 | +- Give honest opinions, even if they disagree with what Brendan said. |
| 54 | +- Find ROOT CAUSES, not bandaids. |
| 55 | +- Be concise. Do the work. Stop narrating. |
| 56 | + |
| 57 | +### DO NOT: |
| 58 | +- Manufacture problems. If there's nothing to fix, SAY SO. |
| 59 | +- Cheerleader language: "smoking gun", "this changes everything", "game changer" |
| 60 | +- Agree with everything. Brendan needs honest pushback. |
| 61 | +- Run audit/fix loops that create new problems to fix later. |
| 62 | +- Implement changes based on hypotheses — test first. |
| 63 | +- Reference specific model names when discussing general architecture. |
| 64 | +- Apologize repeatedly. Just work. |
| 65 | +- Throw bandaids. If you can't find the root cause, say so. |
| 66 | + |
| 67 | +## Known Recurring Issues (as of Feb 18, 2026) |
| 68 | + |
| 69 | +### FIXED — Files Not Being Created |
| 70 | +- **Root cause found and fixed**: `projectPath` was null at startup because it's only set |
| 71 | + when user opens a folder via File > Open Folder. `_writeFile` joined basename with `''` → |
| 72 | + wrote to process CWD. Orphaned files confirmed at D:\models\models\, C:\Users\brend\IDE\, etc. |
| 73 | +- **Fix**: `_writeFile` and `_createDirectory` now return clear error when no project is open. |
| 74 | + Removed `|| ''` fallback. Added `files-changed` IPC notification so FileTree auto-refreshes. |
| 75 | +- **Note**: File Explorer New Folder/New File buttons — not yet investigated. |
| 76 | + |
| 77 | +### FIXED (Attempt 4) — Google Sign-In |
| 78 | +- **Root cause**: `onHeadersReceived` callback was `async` with `await` inside, which |
| 79 | + caused timing issues with Electron's webRequest callback mechanism. The `callback()` |
| 80 | + was delayed while `activateWithToken` ran, potentially blocking the OAuth redirect. |
| 81 | + Multiple strategies (4) all failed due to race conditions. |
| 82 | +- **Fix (v4)**: Replaced `onHeadersReceived` with `session.cookies.on('changed')` event. |
| 83 | + This is Electron's native cookie change event — fires synchronously when any cookie |
| 84 | + is set in the session, no timing race possible. Fallback: if cookie event doesn't fire |
| 85 | + within 2s of landing on /account, tries direct cookie read. |
| 86 | +- **Caveat**: Cannot test OAuth end-to-end in this environment. If it fails again, |
| 87 | + check logs at %APPDATA%/guIDE/logs/guide-main.log for `[OAuth]` entries. |
| 88 | + |
| 89 | +### FIXED — Template Response Loop (0.6B) |
| 90 | +- **Root cause**: chatHistory persisted intermediate agentic turns (injected tool feedback, |
| 91 | + continue instructions) across separate user messages. For 0.6B models with limited |
| 92 | + attention, the pattern `user: [tool feedback]` → `model: "No further action"` was |
| 93 | + strongly reinforced, causing the model to repeat it regardless of new input. |
| 94 | +- **Fix**: After agentic loop completes, chatHistory is condensed to system + original |
| 95 | + user message + final model response. KV cache invalidated. |
| 96 | + |
| 97 | +### FIXED — Thinking Model Gibberish (Llama-3.2-3B-thinking etc.) |
| 98 | +- **Root cause**: `thinkTokens.mode = 'none'` in llama profile suppressed thinking tokens |
| 99 | + for ALL llama models. Thinking-variant models (trained with chain-of-thought) NEED to |
| 100 | + generate `<think>...</think>` before answering — without it, their logits produce gibberish. |
| 101 | +- **Fix**: `_getModelSpecificParams()` now detects "thinking", "cot", "r1-distill", |
| 102 | + "reasoning" in the model name and overrides thinkTokens to budget mode. |
| 103 | + |
| 104 | +### FIXED — Phi-4-mini Stuck on "Thinking..." (Grammar Retry Cascade) |
| 105 | +- **Root cause**: Grammar-constrained generation hung (0 tokens in rejection sampling). |
| 106 | + After 2 grammar timeouts + 1 text-mode timeout, rollback budget exhaustion RESET |
| 107 | + `consecutiveEmptyGrammarRetries` to 0, re-enabling grammar for next iteration. |
| 108 | + With 3 nudges × (5s+5s+120s) = 7.5+ minutes of dead time. |
| 109 | +- **Fix**: Don't reset `consecutiveEmptyGrammarRetries` on rollback budget exhaustion. |
| 110 | + Once grammar fails, it stays disabled. Grammar timeout reduced from 15s → 5s. |
| 111 | + |
| 112 | +### FIXED — Model Switch Mid-Load Race Condition |
| 113 | +- **Root cause**: `initialize()` called `loadModel()` (180s timeout) but had no way to |
| 114 | + know it was superseded. Second `initialize()` call ran concurrently, both wrote to |
| 115 | + `this.model`/`this.context`, wrong model ended up loaded. |
| 116 | +- **Fix**: Added `_loadGeneration` monotonic counter. Each `initialize()` gets a unique ID |
| 117 | + and calls `checkSuperseded()` after every heavy await. Superseded loads throw immediately. |
| 118 | + |
| 119 | +### NOT YET INVESTIGATED |
| 120 | +- File Explorer New Folder / New File buttons don't work |
| 121 | +- Tool call dropdowns expanding during streaming (code defaults to collapsed — may be |
| 122 | + streaming render issue where JSON isn't parsed as a tool call block) |
| 123 | +- System may be over-engineered — Brendan suspects too many moving parts actively hindering |
| 124 | +- When investigating issues, consider whether existing code is CAUSING the problem |
| 125 | + before adding more code on top. |
| 126 | +- Simplicity > cleverness. If a simpler approach works, use it. |
| 127 | + |
| 128 | +## HARD RULES — READ BEFORE DOING ANYTHING |
| 129 | + |
| 130 | +### NO FAKE FIXES |
| 131 | +- Only implement fixes you are CERTAIN will solve the problem. |
| 132 | +- If you cannot determine the root cause, say "I don't know" — this is always acceptable. |
| 133 | +- Never implement a guess and call it a fix. Bandaids waste Brendan's time. |
| 134 | +- If a fix requires testing you can't do (e.g., OAuth), SAY SO explicitly. |
| 135 | + |
| 136 | +### NO MANUFACTURED PROBLEMS |
| 137 | +- When asked to find problems, genuinely look. If there are none, say "I found nothing." |
| 138 | +- Do not fabricate issues to appear helpful. Brendan catches this every time. |
| 139 | + |
| 140 | +### HONESTY OVER HELPFULNESS |
| 141 | +- "I don't know" is always better than a wrong answer. |
| 142 | +- "There's nothing to fix" is always better than a fake fix. |
| 143 | +- "I can't test this" is always better than claiming something works when you haven't verified it. |
| 144 | +- Never claim a fix works unless you have proof (build output, test result, etc.). |
| 145 | + |
| 146 | +### LOGGING |
| 147 | +- Persistent file logs exist at %APPDATA%/guIDE/logs/guide-main.log |
| 148 | +- All info/warn/error logs are written to file automatically |
| 149 | +- Set LOG_LEVEL=debug for verbose output |
| 150 | +- Always check log files first when diagnosing issues |
| 151 | + |
| 152 | +## Technical Stack |
| 153 | + |
| 154 | +- Electron + Vite + React |
| 155 | +- node-llama-cpp for local inference |
| 156 | +- Main process: main/ directory (agenticChat.js, llmEngine.js, modelProfiles.js, etc.) |
| 157 | +- Frontend: src/ directory |
| 158 | +- Website: website/ directory (Next.js) |
| 159 | +- Models on D:\models |
| 160 | + |
| 161 | +## The Pipeline Difference From LM Studio |
| 162 | + |
| 163 | +LM Studio: simple prompt, no grammar constraining, default sampling → coherent output. |
| 164 | +guIDE: system prompt + tool definitions + few-shot examples + grammar constraining + |
| 165 | +custom sampling → potentially degraded output. |
| 166 | + |
| 167 | +The pipeline must HELP models, not fight them. |
0 commit comments