When assisting the user, remember these guidelines:
- Be proactive about committing completed features and work using git. Do not commit half-implemented code. Do not push without asking.
- Always create GitHub repos as private (
--private). Never use--public. Matthew will manually change visibility when ready. - After writing code, remember to test that it works properly. Do this by writing unit tests in a proper unit test file. Use python -c very sparingly.
- Use type hints whenever possible
- Your user, Matthew, often wants to learn and understand the code you write and occasionally is impatient and just wants the answer. This impatience is sometimes justified and sometimes a bad habit. Intelligently figure out whether Matthew should slow down and learn or whether you should just give him the answer.
Use uv exclusively for Python package management in all Python projects.
- All Python dependencies must be installed, synchronized, and locked using uv
- Never use pip, pip-tools, poetry, or conda directly for dependency management
Use these commands:
- Install dependencies:
uv add <package> - Remove dependencies:
uv remove <package> - Sync dependencies:
uv sync
- Run a Python script with
uv run <script-name>.py - Run Python tools like Pytest with
uv run pytestoruv run ruff - Launch a Python repl with
uv run python
- Run a Python script with inline metadata (dependencies defined at the top of the file) with:
uv run script.py - You can add or remove dependencies manually from the
dependencies =section at the top of the script, or - Or using uv CLI:
uv add package-name --script script.pyuv remove package-name --script script.py
This machine is shared. Each folder under /home/cs29824/ belongs to a different user. Always verify you are using the correct git identity before committing or pushing.
| Folder | GitHub User |
|---|---|
matthew/ |
AMindToThink |
andre/ |
antebe |
Before any git commit or push, run git config user.name to confirm the identity matches the folder you're working in.
- Never use
continueto silently skip errors. If something would fail, fail loudly and early. Crashing on bad input is good — it surfaces the problem immediately. - Validate preconditions upfront and raise/exit before doing any work, rather than catching errors mid-loop and pressing on.
- Prefer failing fast over producing partial/misleading results.
When creating skills, always use the directory convention: ~/.claude/skills/<skill-name>/SKILL.md. Never create a flat file like ~/.claude/skills/<skill-name>.md — it will be silently ignored.
Every number cited in a paper's prose (abstract, captions, discussion, inline stats in any section) must resolve through a script-generated source — either an \input{}'d table or a \newcommand{\macroName}{value} macro emitted by a build script. Never hand-type a number into prose. This is the same discipline as bibliography-from-ids: the data has one source of truth, the script reads it, and the document references the result by name.
The standard pattern:
- A build script (e.g.
scripts/build_paper_macros.py) reads the authoritative data files (JSONs, CSVs, other generated tables) and writes a singleresults/tables/paper_macros.texfile containing one\newcommandper inline scalar. - The paper
\inputs it once near the top. Prose writes\crossmodeQwenDiagMeaninstead of60.5. - A unit test asserts (a) every paper-referenced macro is defined in the generated file, (b) forbidden hand-typed substrings that were replaced no longer appear — regression guard against re-introduction.
The import-content skill covers the script-to-document pattern; apply it to inline scalars, not just tables. When auditing a paper for the first time, grep for digit sequences in prose and treat each unexplained one as a potential hand-typed number.
In a causal language model, log P(token at position i | preceding tokens) depends only on the model and positions < i. Running n forward passes with growing prefixes ("multi-pass") therefore computes exactly the same per-token log-probs as a single forward pass over the full concatenation — but wastes O(n²) FLOPs for zero informational benefit. Pass n already contains everything from passes 1..n-1 for free, because causal attention cannot see future tokens.
Treat "multi-pass" as a red flag in code, prose, and investigation reports:
- Do not compare single-pass against multi-pass as if one were ground truth. Any difference reflects tokenization choices (e.g., BPE merges at boundaries), not a bias. Single-pass IS the metric by definition.
- Do not write "distortion", "bias", or "error" framings of SP–MP differences.
- Flag multi-pass code when you find it (add a header comment explaining the above, or open a follow-up issue); do not silently delete it — leave a trail so future readers learn from the confusion. Similarly flag any report or test that treats multi-pass as a baseline.
- If a pre-existing test asserts SP ≈ MP, flag it as tautological. When safe, replace it with direct tests of boundary detection and extraction.