Skip to content

Commit 14c9976

Browse files
committed
fix: resolve remaining CI failures
ci.yml Check formatting: exclude .gomodcache etc (matches quality.yml fix from previous commit which didn't cover this second identical check) Pattern discovery race: samplingWorker read pde.stopChan without the mutex; concurrent Start()s that reassigned stopChan raced with a long-lived worker still on the previous channel. Worker now takes stopChan as a parameter and captures it at spawn time. Added IsRunning() thread-safe getter for the test (previously the test read pde.running directly, which also raced). quality.yml golangci-lint: pin to v1.64.8. The action's 'latest' resolves to a binary built with Go 1.24 which fails against our go 1.25 module directive with 'Go language version used to build golangci-lint is lower than the targeted Go version'. release-please fails separately with 'GitHub Actions is not permitted to create or approve pull requests'. That is a repo-Settings toggle (Settings → Actions → General → Workflow permissions → 'Allow GitHub Actions to create and approve pull requests'). Cannot be fixed in code; flagged for manual toggle. Tests: 49/49 packages pass, including -race on internal/pattern.
1 parent caf77d3 commit 14c9976

86 files changed

Lines changed: 3026 additions & 12 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.claude-plugin/plugin.json

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
{
2+
"name": "tok",
3+
"description": "Unified token-reduction CLI: transparent command output filtering, six compression modes (lite/full/ultra/wenyan-*), SQLite-backed savings analytics, and 69+ built-in filters.",
4+
"author": {
5+
"name": "Lakshman Patel",
6+
"url": "https://github.com/lakshmanpatel/tok"
7+
},
8+
"hooks": {
9+
"SessionStart": [
10+
{
11+
"hooks": [
12+
{
13+
"type": "command",
14+
"command": "node \"${CLAUDE_PLUGIN_ROOT}/hooks/tok-mode-activate.js\"",
15+
"timeout": 5,
16+
"statusMessage": "Loading tok mode..."
17+
}
18+
]
19+
}
20+
],
21+
"UserPromptSubmit": [
22+
{
23+
"hooks": [
24+
{
25+
"type": "command",
26+
"command": "node \"${CLAUDE_PLUGIN_ROOT}/hooks/tok-mode-tracker.js\"",
27+
"timeout": 5,
28+
"statusMessage": "Tracking tok mode..."
29+
}
30+
]
31+
}
32+
]
33+
}
34+
}

.claude/scheduled_tasks.lock

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
{"sessionId":"c62a742d-ec0a-47a3-887a-bbdbe70b0024","pid":1688229,"acquiredAt":1776659743194}

.github/workflows/ci.yml

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -63,9 +63,10 @@ jobs:
6363

6464
- name: Check formatting
6565
run: |
66-
if [ "$(gofmt -s -l . | wc -l)" -gt 0 ]; then
67-
echo "Please run 'gofmt -w .' to format the following files:"
68-
gofmt -s -l .
66+
bad=$(gofmt -s -l . 2>/dev/null | grep -v -E '^(\.gomodcache|\.gocache|\.gosrccache|vendor)/' || true)
67+
if [ -n "$bad" ]; then
68+
echo "Please run 'gofmt -s -w .' to format the following files:"
69+
echo "$bad"
6970
exit 1
7071
fi
7172

.github/workflows/quality.yml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,11 @@ jobs:
9595
- name: Run golangci-lint
9696
uses: golangci/golangci-lint-action@v6
9797
with:
98-
version: latest
98+
# Pin to a version built with Go ≥ our module's go-directive (1.25).
99+
# "latest" currently resolves to a v1.x built with Go 1.24 which
100+
# errors with "Go language version used to build golangci-lint is
101+
# lower than the targeted Go version".
102+
version: v1.64.8
99103
args: --timeout=5m
100104

101105
- name: Check go mod tidy

evals/llm_run.py

Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
"""
2+
Run each prompt through Claude Code in three conditions and snapshot the
3+
real LLM outputs:
4+
5+
1. baseline — no extra system prompt at all
6+
2. terse — system prompt: "Answer concisely."
7+
3. terse+skill — system prompt: "Answer concisely.\n\n{SKILL.md}"
8+
9+
The honest delta is (3) vs (2): how much does the SKILL itself add on top
10+
of a plain "be terse" instruction? Comparing (3) vs (1) conflates the
11+
skill with the generic terseness ask, which is what the previous version
12+
of this harness did.
13+
14+
This is the source-of-truth generator. It calls a real LLM and produces
15+
evals/snapshots/results.json. Run it locally when SKILL.md files change.
16+
The CI-side `measure.py` only reads the snapshot and counts tokens.
17+
18+
Requires:
19+
- `claude` CLI on PATH (Claude Code), authenticated
20+
21+
Run: uv run python evals/llm_run.py
22+
23+
Environment:
24+
TOK_EVAL_MODEL optional --model flag value passed through to claude
25+
"""
26+
27+
from __future__ import annotations
28+
29+
import datetime as dt
30+
import json
31+
import os
32+
import subprocess
33+
from pathlib import Path
34+
35+
EVALS = Path(__file__).parent
36+
SKILLS = EVALS.parent / "rules"
37+
PROMPTS = EVALS / "prompts" / "en.txt"
38+
SNAPSHOT = EVALS / "snapshots" / "results.json"
39+
40+
TERSE_PREFIX = "Answer concisely."
41+
42+
43+
def run_claude(prompt: str, system: str | None = None) -> str:
44+
cmd = ["claude", "-p"]
45+
if system:
46+
cmd += ["--system-prompt", system]
47+
if model := os.environ.get("TOK_EVAL_MODEL"):
48+
cmd += ["--model", model]
49+
cmd.append(prompt)
50+
out = subprocess.run(cmd, capture_output=True, text=True, check=True)
51+
return out.stdout.strip()
52+
53+
54+
def claude_version() -> str:
55+
try:
56+
out = subprocess.run(
57+
["claude", "--version"], capture_output=True, text=True, check=True
58+
)
59+
return out.stdout.strip()
60+
except Exception:
61+
return "unknown"
62+
63+
64+
def main() -> None:
65+
prompts = [p.strip() for p in PROMPTS.read_text().splitlines() if p.strip()]
66+
skills = sorted(p.name for p in SKILLS.iterdir() if (p / "SKILL.md").exists())
67+
68+
print(
69+
f"=== {len(prompts)} prompts × ({len(skills)} skills + 2 control arms) ===",
70+
flush=True,
71+
)
72+
73+
snapshot: dict = {
74+
"metadata": {
75+
"generated_at": dt.datetime.now(dt.timezone.utc).isoformat(),
76+
"claude_cli_version": claude_version(),
77+
"model": os.environ.get("TOK_EVAL_MODEL", "default"),
78+
"n_prompts": len(prompts),
79+
"terse_prefix": TERSE_PREFIX,
80+
},
81+
"prompts": prompts,
82+
"arms": {},
83+
}
84+
85+
print("baseline (no system prompt)", flush=True)
86+
snapshot["arms"]["__baseline__"] = [run_claude(p) for p in prompts]
87+
88+
print("terse (control: terse instruction only, no skill)", flush=True)
89+
snapshot["arms"]["__terse__"] = [
90+
run_claude(p, system=TERSE_PREFIX) for p in prompts
91+
]
92+
93+
for skill in skills:
94+
skill_md = (SKILLS / skill / "SKILL.md").read_text()
95+
system = f"{TERSE_PREFIX}\n\n{skill_md}"
96+
print(f" {skill}", flush=True)
97+
snapshot["arms"][skill] = [run_claude(p, system=system) for p in prompts]
98+
99+
SNAPSHOT.parent.mkdir(parents=True, exist_ok=True)
100+
SNAPSHOT.write_text(json.dumps(snapshot, ensure_ascii=False, indent=2))
101+
print(f"\nWrote {SNAPSHOT}")
102+
103+
104+
if __name__ == "__main__":
105+
main()

evals/plot.py

Lines changed: 150 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,150 @@
1+
"""
2+
Generate a boxplot showing the distribution of token compression per
3+
skill, compared against a plain "Answer concisely." control.
4+
5+
Reads evals/snapshots/results.json and writes:
6+
- evals/snapshots/results.html (interactive plotly)
7+
- evals/snapshots/results.png (static export for README/PR embed)
8+
9+
Run: uv run --with tiktoken --with plotly --with kaleido python evals/plot.py
10+
"""
11+
12+
from __future__ import annotations
13+
14+
import json
15+
import statistics
16+
from pathlib import Path
17+
18+
import plotly.graph_objects as go
19+
import tiktoken
20+
21+
ENCODING = tiktoken.get_encoding("o200k_base")
22+
SNAPSHOT = Path(__file__).parent / "snapshots" / "results.json"
23+
HTML_OUT = Path(__file__).parent / "snapshots" / "results.html"
24+
PNG_OUT = Path(__file__).parent / "snapshots" / "results.png"
25+
26+
27+
def count(text: str) -> int:
28+
return len(ENCODING.encode(text))
29+
30+
31+
def main() -> None:
32+
data = json.loads(SNAPSHOT.read_text())
33+
arms = data["arms"]
34+
meta = data.get("metadata", {})
35+
36+
terse_tokens = [count(o) for o in arms["__terse__"]]
37+
38+
rows = []
39+
for skill, outputs in arms.items():
40+
if skill in ("__baseline__", "__terse__"):
41+
continue
42+
skill_tokens = [count(o) for o in outputs]
43+
savings = [
44+
(1 - (s / t)) * 100 if t else 0.0
45+
for s, t in zip(skill_tokens, terse_tokens)
46+
]
47+
rows.append(
48+
{"skill": skill, "savings": savings, "median": statistics.median(savings)}
49+
)
50+
51+
rows.sort(key=lambda r: -r["median"]) # best first
52+
53+
fig = go.Figure()
54+
55+
for row in rows:
56+
fig.add_trace(
57+
go.Box(
58+
y=row["savings"],
59+
name=row["skill"],
60+
boxpoints="all",
61+
jitter=0.4,
62+
pointpos=0,
63+
marker=dict(color="#2ca02c", size=7, opacity=0.7),
64+
line=dict(color="#2c3e50", width=2),
65+
fillcolor="rgba(76, 120, 168, 0.25)",
66+
boxmean=True,
67+
hovertemplate="<b>%{x}</b><br>%{y:.1f}%<extra></extra>",
68+
)
69+
)
70+
71+
# zero line — "no effect"
72+
fig.add_hline(
73+
y=0,
74+
line=dict(color="black", width=1.5, dash="dash"),
75+
annotation_text="no effect (= same length as control)",
76+
annotation_position="top right",
77+
annotation_font=dict(size=11, color="black"),
78+
)
79+
80+
# median labels above each box
81+
for row in rows:
82+
fig.add_annotation(
83+
x=row["skill"],
84+
y=max(row["savings"]),
85+
text=f"<b>{row['median']:+.0f}%</b>",
86+
showarrow=False,
87+
yshift=22,
88+
font=dict(size=16, color="#2c3e50"),
89+
)
90+
91+
fig.update_layout(
92+
title=dict(
93+
text=f"<b>How much shorter does each skill make Claude's answers?</b><br>"
94+
f"<sub>Distribution of per-prompt savings vs system prompt = "
95+
f"<i>'Answer concisely.'</i><br>"
96+
f"{meta.get('model', '?')} · n={meta.get('n_prompts', '?')} prompts · "
97+
f"single run per arm</sub>",
98+
x=0.5,
99+
xanchor="center",
100+
),
101+
xaxis=dict(title="", automargin=True),
102+
yaxis=dict(
103+
title="↑ shorter · vs control · longer ↓",
104+
ticksuffix="%",
105+
zeroline=False,
106+
gridcolor="rgba(0,0,0,0.08)",
107+
range=[-30, 115],
108+
),
109+
plot_bgcolor="white",
110+
height=560,
111+
width=980,
112+
margin=dict(l=140, r=80, t=120, b=120),
113+
showlegend=False,
114+
annotations=[
115+
dict(
116+
x=0.5,
117+
y=-0.22,
118+
xref="paper",
119+
yref="paper",
120+
showarrow=False,
121+
font=dict(size=11, color="#555"),
122+
text=(
123+
"<b>box</b> = IQR (middle 50%) · "
124+
"<b>line in box</b> = median · "
125+
"<b>dashed line</b> = mean · "
126+
"<b>green dots</b> = individual prompts"
127+
),
128+
)
129+
],
130+
)
131+
132+
# re-add labels after update_layout (which would otherwise wipe them)
133+
for row in rows:
134+
fig.add_annotation(
135+
x=row["skill"],
136+
y=max(row["savings"]),
137+
text=f"<b>{row['median']:+.0f}%</b>",
138+
showarrow=False,
139+
yshift=22,
140+
font=dict(size=16, color="#2c3e50"),
141+
)
142+
143+
fig.write_html(HTML_OUT)
144+
print(f"Wrote {HTML_OUT}")
145+
fig.write_image(PNG_OUT, scale=2)
146+
print(f"Wrote {PNG_OUT}")
147+
148+
149+
if __name__ == "__main__":
150+
main()

evals/prompts/en.txt

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
Why does my React component re-render every time the parent updates?
2+
Explain database connection pooling.
3+
What's the difference between TCP and UDP?
4+
How do I fix a memory leak in a long-running Node.js process?
5+
What does the SQL EXPLAIN command tell me?
6+
How does a hash table handle collisions?
7+
Why am I getting CORS errors in my browser console?
8+
What's the point of using a debouncer on a search input?
9+
How does git rebase differ from git merge?
10+
When should I use a queue vs a topic in messaging systems?

filters/ansible.toml

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
# Ansible playbook filter
2+
schema_version = 1
3+
4+
[ansible_playbook]
5+
match_command = "^ansible-playbook\\b"
6+
strip_ansi = true
7+
strip_lines_matching = [
8+
"^\\s*$",
9+
"^ok: \\[",
10+
"^skipping: \\[",
11+
"^\\s*Gathering Facts",
12+
]
13+
keep_lines_matching = [
14+
"^PLAY \\[",
15+
"^TASK \\[",
16+
"^changed: \\[",
17+
"^failed: \\[",
18+
"^fatal: \\[",
19+
"^unreachable: \\[",
20+
"^PLAY RECAP",
21+
"^\\S+\\s+:\\s+ok=\\d+",
22+
"Error:.*",
23+
]
24+
max_lines = 80
25+
on_empty = "Playbook completed, no changes"
26+
27+
[ansible_inventory]
28+
match_command = "^ansible-inventory\\b"
29+
strip_ansi = true
30+
keep_lines_matching = [
31+
"^\\[.*\\]",
32+
"^\\S+$",
33+
]
34+
max_lines = 40

filters/basedpyright.toml

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# basedpyright filter
2+
schema_version = 1
3+
4+
[basedpyright]
5+
match_command = "^basedpyright\\b"
6+
strip_ansi = true
7+
strip_lines_matching = [
8+
"^\\s*$",
9+
"^Searching for source files",
10+
"^Found \\d+ source file",
11+
"^Pyright \\d+\\.\\d+",
12+
"^basedpyright \\d+\\.\\d+",
13+
]
14+
max_lines = 50
15+
on_empty = "basedpyright: ok"

0 commit comments

Comments
 (0)