Skip to content

Commit 8169930

Browse files
authored
v0.2.1: hardening + perf + DX + operational maturity (#11)
* chore(lint): clear rustc 1.95 clippy and fmt regressions Pre-existing regressions on rustc 1.95 baseline (useless_vec, unnecessary_sort_by, rustfmt style adjustments) blocked the CI gate. Applied: cargo fmt --all + cargo clippy --fix + one manual sort_by_key migration in session/discovery.rs. No semantic changes; formatting and lint compliance only. Closes claude-memory-iyn.12 * docs(plan): epic A — v0.1.4 hardening plan Documents the 11-task hardening epic landing as 0.1.4: HTTP timeout, graceful JSONL skip, env-var classifier model, longer task_id, drop stub field, SCHEMA_VERSION const, CHANGELOG.md, cargo-audit CI, MSRV job, .editorconfig, JSONL file-lock. Backwards-compatible only. Breaking and perf changes deferred to epic B (v0.2.0). Quality/DX deferred to epic C. Refs claude-memory-iyn * fix(db): rebuild_state skips malformed JSONL lines instead of aborting A single bad line in the events JSONL would abort the whole rebuild transaction, leaving SQLite empty and re-aborting on every retry. For an append-only journal this is too brittle. Now: malformed lines are logged via tracing::warn and skipped; SQL errors still propagate (those indicate schema/integrity problems). Returned count reflects only successfully-indexed events. Adds tracing dep to tj-core (workspace dep already declared). New test rebuild_state_skips_malformed_jsonl_lines covers both non-JSON garbage and valid-JSON-but-not-an-Event cases. Closes claude-memory-iyn.2 * fix(http): add 15s timeout to AnthropicClassifier requests The HTTP classifier built the request without any timeout, so on a stalled network or rate-limit lockup the call would hang indefinitely. Hooks wrap classifier calls in || true, but that protects against exit codes, not against blocked turns. Adds AnthropicClassifier::timeout field (default 15s via DEFAULT_TIMEOUT const). Used in the ureq Request chain. Test classifier_times_out_on_unresponsive_server binds a TCP socket that completes the handshake but never replies; with timeout=300ms the call must Err in well under 3s. Closes claude-memory-iyn.1 * refactor(core): centralize SCHEMA_VERSION as single const The schema-version string "1.0" was inlined at four production sites (event.rs, pack.rs x2, tj-mcp main.rs). Bumping the version required four search-replaces — one of them being in another crate. Now: pub const SCHEMA_VERSION in tj-core::lib, referenced from all four sites. Test pack_assembler_does_not_inline_schema_version_literal guards against future regressions by scanning pack.rs source. Closes claude-memory-iyn.6 * fix(id): extend task_id from 6 to 10 characters Six base32 chars from a ULID give ~24 bits of entropy, which is ~4096 tasks before a 50% collision risk under birthday paradox. For a long-lived project journal this is uncomfortably close. Now: tj_core::new_task_id() helper produces "tj-" + 10 chars (~50 bits, ~33M threshold). Used in tj-cli, tj-mcp, and the session backfill extractor — replaces three slightly-different inline copies. Old 6-char IDs continue to work since storage keys are opaque strings; this only affects newly-generated tasks. Tests: shape check + 10k uniqueness sweep. Closes claude-memory-iyn.4 * chore(mcp): remove vestigial stub:bool from all responses Phase-1 left every MCP result type with a stub:bool flag that was always false in production. The field was never read by any client and made every JSON payload look unfinished. Removed from TaskPackResult, TaskPackMetadata, TaskSearchResult, TaskCreateResult, EventAddResult, TaskCloseResult and their eight in-place initializations. Regression test no_response_serializes_a_ stub_field guards against re-introduction. Technically a JSON shape change, but stub was a write-only field with no documented consumers — clients reading these payloads will see one fewer key, never an unexpected one. Closes claude-memory-iyn.5 * chore: add .editorconfig at repo root Standard OSS hygiene file. LF line endings, UTF-8, final newline, trim trailing whitespace, 4-space rust, 2-space yaml/toml/md/json, 2-space sh, tab Makefile. Cargo.lock and *.jsonl carve-outs. Closes claude-memory-iyn.10 * ci: add MSRV job pinning rust 1.83 Cargo.toml declares rust-version = 1.83 but the existing CI only tested @stable, so an accidental new-feature use would slip in silently and break downstream consumers locked to MSRV. New msrv job: ubuntu-latest with dtolnay/rust-toolchain@1.83, cargo build + cargo test on the full workspace. Separate cache key so it does not collide with the stable job. Closes claude-memory-iyn.9 * feat(classifier): TJ_CLASSIFIER_MODEL env var overrides hardcoded model Both the subscription (claude -p) and Anthropic API classifiers hardcoded their model alias. When Anthropic deprecates a model the classifier silently breaks until a release ships. Now: each classifier reads TJ_CLASSIFIER_MODEL with backend-specific default (haiku alias for CLI, claude-haiku-4-5-20251001 for API). DEFAULT_MODEL constants exposed for tests and external override. Test tj_classifier_model_env_var_overrides_defaults_for_both_backends combines both backends into one serialized read-set-restore flow to avoid env-var races between concurrent test threads. README documents the new env var. Closes claude-memory-iyn.3 * ci: add cargo-audit job for security advisories A published crate without supply-chain auditing is a rough edge. The existing CI ran fmt, clippy, test, and doc but had no advisory gate. New audit job uses rustsec/audit-check@v2 against RUSTSEC. Marked continue-on-error initially so an existing transitive-dep advisory does not block unrelated PRs; once the first run is green we remove the flag and make audit blocking. Closes claude-memory-iyn.8 * fix(storage): exclusive file lock around JsonlWriter append (race-safe Windows) POSIX append on Linux is atomic for writes <= PIPE_BUF, but Windows makes no such guarantee. Two writers (auto-capture hook + manual task-journal event + MCP server) racing on the same JSONL file could interleave bytes mid-line, corrupting the source of truth. Now: JsonlWriter wraps the file in fd_lock::RwLock; append and flush_durable each acquire an exclusive advisory lock for the duration of the write/sync. Cross-platform: flock on Linux/macOS, LockFileEx on Windows. Removed the BufWriter wrapper — for a journal seeing handful of events per minute, a syscall per write is unmeasurable, and buffering with locks added complexity without real benefit. Test concurrent_appends_do_not_interleave_bytes spawns 8 threads each owning its own JsonlWriter (own File handle, own fd_lock instance) and writing 100 events. Asserts 800 well-formed Events. Closes the loop on race-free behavior on both platforms. Closes claude-memory-iyn.11 * docs: add CHANGELOG.md (Keep-a-Changelog) covering 0.1.0..0.1.4 Backfills release notes for the prior four crates.io releases from git history and adds the full v0.1.4 entry summarizing this epic (11 tasks plus a baseline lint cleanup). Linked from the README. Compare links target the GitHub repo so they resolve once the v0.1.4 tag is pushed. Closes claude-memory-iyn.7 * feat(db): migrations framework with schema_migrations table Replaces the single MIGRATION_001 const + execute_batch() pattern with a forward-only migrations registry tracked in a schema_migrations table (version, applied_at). Each declared migration runs at most once per database; reopening an existing DB is a no-op for migrations. Foundation for B2 (incremental indexing introduces a new index_state table via migration v002 — would require this table-of-versions contract anyway, so it lands first). Backwards-compatible for existing 0.1.x databases: schema_migrations starts empty, v001 SQL re-runs against IF NOT EXISTS tables harmlessly, and the v=1 row is recorded on first 0.2.0 open. Tests: fresh_db_runs_all_migrations + apply_migrations_is_idempotent_ across_reopens cover both the fresh and upgrade paths. Refs claude-memory-gyq.1 * perf(db): incremental indexing — ingest only the JSONL tail since last marker Every MCP tool call (task_pack, task_search) re-read the entire JSONL log on every invocation and replayed it through events_index/search_fts. At 10k events that is seconds per call; at 100k it is unworkable. Schema: migration v002 adds index_state(project_hash PK, last_indexed_ event_id, updated_at). rebuild_state and the new ingest_new_events both update this row to the most recent event_id they wrote. Behavior: ingest_new_events scans to the marker and applies only the tail. Two safe fall-back paths to a full rebuild_state: • no marker yet (first call after migration v002) • marker not present in JSONL (file was rewritten or hand-edited) The fallback path emits a tracing::warn so corruption is visible. Switched five callers (mcp::task_pack, mcp::task_search, cli::pack, cli::ingest-hook, cli::search) to ingest_new_events. The explicit CLI command retains the full rebuild semantics — it is the recovery escape hatch. Tests: • ingest_new_events_picks_up_only_new_lines (3 + 2 events; second pass reads only the 2 new lines). • ingest_new_events_falls_back_to_full_rebuild_when_marker_vanishes. • rebuild_state_and_ingest_new_events_produce_same_state (golden equivalence comparison). Refs claude-memory-gyq.2 * perf(pack): regression test for working pack-cache after incremental ingest Before B2 every MCP call ran a full rebuild_state which replayed every event through index_event(), and index_event() invalidates the pack cache for that task. So pack-cache rows lived for milliseconds at most — never reused. After B2 ingest_new_events only processes the JSONL tail. When there are no new events at all, no index_event runs, no cache rows are DELETEd, and the next assemble() returns metadata.cache_hit = true. The fix is implicit (it falls out of B2) — adding the test now so a future regression in either ingest_new_events or index_event will break this test rather than silently double our pack latency. Refs claude-memory-gyq.3 * feat(mcp)!: structured RPC error envelope (BREAKING) Tool handlers no longer mask failures as success-typed Json with task_id = literal [error] msg. They now return Result<Json<T>, McpError>, so a tj_core failure surfaces as a JSON-RPC error frame that the client can detect, log, and surface to the user. BREAKING CHANGE: any client parsing the [error] string out of the task_id field will see a JSON-RPC error response instead. Update by checking for the rpc error frame before deserializing the result. Before: After: task_pack -> Json<TaskPackResult> task_pack -> Result<Json<...>, McpError> task_search -> Json<TaskSearchResult> task_search -> Result<Json<...>, McpError> task_create -> Json<TaskCreateResult> task_create -> Result<Json<...>, McpError> event_add -> Json<EventAddResult> event_add -> Result<Json<...>, McpError> task_close -> Json<TaskCloseResult> task_close -> Result<Json<...>, McpError> Helper into_mcp_error formats the full anyhow chain (root cause + context wraps) into the RPC error message so the client sees the same diagnostic depth a Rust caller would. Tests: - into_mcp_error_carries_full_anyhow_chain - task_pack_returns_rpc_error_when_state_dir_is_unusable (smoke: project_paths failure → into_mcp_error gives non-empty msg) Refs claude-memory-gyq.4 * fix(mcp,cli): validate task_id exists before recording close event Closing a non-existent task used to silently succeed: the close event would be appended to JSONL with a task_id that has no open event, leaving an unclosable orphan record. Now: both the CLI Close subcommand and the MCP task_close tool ingest_new_events first (catch up the index), then assert task_exists() before writing the close event. Failure surfaces as anyhow::Error in CLI (non-zero exit + stderr) and as McpError in MCP (RPC error frame, thanks to B4). New helpers: - tj_core::db::task_exists(conn, task_id) -> bool Tests: - task_exists_returns_true_for_known_id_false_otherwise (unit) - close_unknown_task_id_returns_error (CLI integration; cargo bin runs in a temp XDG_DATA_HOME) Refs claude-memory-gyq.5 * feat(mcp): --project-dir argument overrides cwd The MCP server always derived the project_hash from the cwd at the moment a tool was invoked. Monorepo and parent-dir flows had no way to point the server at a sub-project without launching it from inside that directory. Now: --project-dir <PATH> on the binary CLI sets a process-wide PROJECT_DIR_OVERRIDE (OnceLock) that every tool handler consults ahead of cwd. Default behaviour is unchanged when the flag is omitted. The path is canonicalized at startup so a relative arg or a symlink becomes a stable absolute hash key. Tests: - resolve_project_paths_uses_provided_dir_for_hash: factor-out helper proves two dirs yield two hashes and one dir is stable. - cli_parses_project_dir_argument: clap parser smoke for both presence and absence of the flag. Refs claude-memory-gyq.6 * perf(mcp): wrap blocking I/O in tokio::task::spawn_blocking The tokio runtime hosts a small thread pool sized to the number of CPU cores. Synchronous SQLite + JSONL + filesystem work directly in async fn handlers monopolised that thread for the duration of each tool call, so two concurrent client requests serialised even on a multicore box. Now: every tool body is moved into a closure passed to tokio::task::spawn_blocking via a small run_blocking() helper that also collapses JoinError + anyhow::Error into McpError. Inside the closure we still own + open + drop SQLite connections normally — crucially never holding a Connection across an await, since rusqlite::Connection is Send but not Sync. The classifier-aware tools never directly call HTTP from the MCP server (only the CLI does), so the synchronous ureq stays on the blocking pool for free. Test run_blocking_executes_two_tasks_concurrently: tokio::join! two 200ms sleep_in_blocking calls and assert wall clock < 350ms. Refs claude-memory-gyq.7 * perf(bench): criterion benches for rebuild_state, pack assemble, FTS search We claim B2 made hot paths O(new) instead of O(all), but every claim without a number is a wish. Adds a criterion harness that exercises the three paths the MCP server walks on every tool call. Three benches, two sizes each (1k and 10k events spread across 100 synthetic tasks): - rebuild_state — full-rebuild baseline (the cost we used to pay on every MCP call before B2) - pack_assemble_cold — invalidates cache then recomputes - search_fts — FTS5 MATCH lookup Wired into CI as a separate benches-compile job that runs cargo bench --no-run; full timing runs are best done locally on a quiet box, not on shared GitHub runners. Threshold gates (B2 promised <50ms pack / <100ms rebuild on 10k) are deferred until a real CI box exists or five baselines are collected. Refs claude-memory-gyq.8 * release: bump workspace version to 0.2.0-rc.1 Last commit of epic B. Workspace version 0.1.3 -> 0.2.0-rc.1. Inner crate dependency declarations updated to match (tj-cli and tj-mcp both depend on tj-core). CHANGELOG.md gets a [0.2.0-rc.1] - 2026-05-06 section with the breaking change (MCP error contract) called out first, then Added / Changed / Performance subsections summarising the eight feature commits in this epic. After dogfooding, 0.2.0 will be cut without further code changes — the rc tag is the gating signal that we want feedback on the new contract before it hits stable. Closes claude-memory-gyq.9 * chore: OSS hygiene — CONTRIBUTING, CoC, issue and PR templates Standard scaffolding so new contributors find the rules without asking. Five files: - CONTRIBUTING.md (one-thing-per-PR, conventional commits, CI gate expectations, what I will not merge) - CODE_OF_CONDUCT.md (Contributor Covenant 2.1 reference) - .github/ISSUE_TEMPLATE/bug.md, feature.md, question.md - .github/PULL_REQUEST_TEMPLATE.md (matches CONTRIBUTING checklist) Plan landed in .docs/plans/2026-05-06-v0.2.0-epic-c-quality.md (epic C scope) — committed in the same change because it covers all eight C sub-tasks rather than just this one. README links to CONTRIBUTING / CoC / issue templates from a new Contributing section. Refs claude-memory-1yc.1 * ci: cargo-llvm-cov coverage job + Codecov upload + README badge Adds a coverage workflow job that runs cargo llvm-cov --workspace --lcov, then uploads via codecov-action@v4. Marked continue-on-error: true on first land — once we collect 5 baselines and agree a floor the gate flips to blocking. CODECOV_TOKEN is read from GitHub secrets if present; for public repos Codecov v4 falls back to anonymous uploads, so the job is useful even before the secret is configured. README gets the codecov badge alongside the existing crates.io / CI / License badges. Refs claude-memory-1yc.2 * test(classifier): cross-platform fake-claude shim, drop cfg(unix) gate The two ClaudeCliClassifier tests were gated cfg(all(test, unix)) and silently skipped on Windows CI. Closes that platform gap. The shim now writes the JSON envelope to a file and executes a tiny script that prints it back: cat "PATH" on Unix (.sh + chmod 0755), type "PATH" on Windows (.cmd batch). The type/cat form avoids the notoriously fragile cmd-batch escaping of the envelope JSON. Result: classifier_parses_cli_envelope_and_returns_classified_output and classifier_surfaces_not_logged_in_with_friendly_hint now run on all three matrix OS in CI. Refs claude-memory-1yc.3 * feat(cli): task-journal doctor diagnostics command Self-check command for users debugging install issues. Reports five groups of facts: 1. claude binary in PATH (with version) — required for the subscription-mode classifier 2. data dir + events/state/metrics sub-dir paths and writability 3. known projects on this machine (count of state-dir SQLite stems) 4. schema migrations applied for the current cwd project (if any) 5. an issues[] list of human-readable problems Exits 0 when issues is empty, 1 otherwise. Default output is human- readable; --json switches to a stable machine-parseable shape. CLI integration tests: - doctor_exits_zero_on_fresh_install (no events/state files yet) - doctor_json_output_is_parseable_and_lists_paths Refs claude-memory-1yc.4 * feat(cli): task-journal migrate-project --from PATH --to PATH Project moved on disk -> canonical-path-derived hash changed -> data orphaned. New CLI command renames the JSONL + SQLite + metrics files from the old project_hash to the new one and updates the project_hash columns inside the SQLite (tasks, index_state). Refuses when --from and --to resolve to the same hash (symlink, case- insensitive FS). Refuses to overwrite an existing destination file unless --force is set. CLI integration tests: - migrate_project_round_trips_data_to_new_path: create task in project A, migrate-project A -> B, pack from B finds the task. - migrate_project_refuses_overwrite_without_force: both have data, migration aborts with destination already exists in stderr. Refs claude-memory-1yc.5 * feat(export): HTML timeline output (export --format html) Adds a third format to the existing export subcommand. Renders a self-contained HTML page (inline CSS, no external assets) showing the task timeline grouped by task_id. Useful as a PR-review attachment or sprint retro artefact. Design notes: - All five HTML special chars (& < > " ) are escaped via html_escape() — no XSS surface even though we never render third-party HTML. - CSS uses prefers-color-scheme so light and dark mode both look sane without a toggle. - Event type pills get a colour class (decision/rejection/evidence/ finding) so timelines are scannable at a glance. - Suggested events get a trailing ? marker matching the rest of the codebase. CLI integration test export_html_emits_self_contained_document: - DOCTYPE html present - task title + event text present - no http:// or https:// — proves no external CSS/font/script leaked into the output. Refs claude-memory-1yc.6 * feat(classifier): few-shot examples in prompt Six worked Input/Output pairs covering the three boundary calls the classifier gets wrong most often: - hypothesis vs finding (2 examples) - finding vs evidence (2 examples) - decision vs hypothesis (2 examples) Each pair pins one half of the boundary so the model sees the contrast inline rather than only as abstract definitions. The examples themselves are drawn from real boundary cases observed during this epic — keeps them representative. The prompt budget guard (prompt_truncates_event_lines_to_keep_size _bounded) still passes after adding ~3KB of fixed prefix, because the recent_tasks block is the variable cost — examples are constant-time addition. New test prompt_contains_few_shot_examples enforces the 6-example floor as a regression guard. Refs claude-memory-1yc.7 * test(classifier): labeled eval dataset + opt-in accuracy gate Adds tests/fixtures/classifier_eval.jsonl with 30 labeled chunks spanning all 12 event types, plus tests/classifier_eval.rs that runs in two modes: - Default (CI-safe): no model API call. Asserts • fixture has ≥ 30 rows • every expected event type is one of EventType::ALL • prompt builder emits each input verbatim Hermetic and fast — runs as part of plain cargo test. - Opt-in (TJ_CLASSIFIER_EVAL=on): runs ClaudeCliClassifier:: default() against every row, computes accuracy, asserts the 0.70 floor and prints misses. Requires on PATH. Skipped silently otherwise. Three new tests, all green by default; the real-classifier one is silent-pass without the env var. Refs claude-memory-1yc.8 * docs: epic C PR body for review * perf(mcp): cache SQLite connections per state-path Every MCP tool handler used to call tj_core::db::open() which runs PRAGMA journal_mode + foreign_keys + apply_migrations + an empty schema_migrations SELECT on every invocation. At small N the open cost dominates the actual work — pack/search/close all paid this overhead even when the underlying state changed nothing. Now: a process-wide HashMap<PathBuf, Arc<Mutex<Connection>>> guarded by an outer OnceLock<Mutex<...>>. cached_open(path) does an O(1) lookup, falls back to db::open() on the cold path, and shares the Arc with future callers. Each tool handler takes the inner mutex for the duration of its work; the outer mutex is held only for the brief insert/lookup. - SQLite Connection is Send (single-threaded mode); safe to send across the spawn_blocking thread boundary inside an Arc. - Inner mutex serialises calls per project_hash. SQLite already serialises writes, so we accept a tiny concurrency loss in exchange for the open-cost saving. - Cache is keyed by PathBuf, so two MCPs running with different --project-dir do not stomp on each other. Tests: - cached_open_returns_same_arc_for_same_path - cached_open_returns_distinct_arcs_for_distinct_paths Refs claude-memory-yj1.1 * feat(export): task-journal export --format sqlite Adds a fourth output format that produces a self-contained SQLite snapshot of the projects derived state. Useful for backups, sharing the state with another machine, or offline analysis with sqlite3 queries. Pipeline: 1. Rebuild from JSONL (source of truth) so the snapshot reflects every event ever appended. 2. VACUUM INTO a temp file produces a clean, defragmented copy. 3. Stream the bytes to stdout so the user can redirect to a file. Test export_sqlite_round_trips_through_pack: - Create a task in xdg_a + proj_a, append a decision event. - export --format sqlite, capture stdout. - Confirm the magic bytes ("SQLite format 3\0") are present. - Drop the snapshot under xdg_b/task-journal/state/<hash>.sqlite (no JSONL on the destination side). - task-journal pack from xdg_b finds the same task with the same decision text — the snapshot is read-only-self-contained. Refs claude-memory-yj1.2 * feat(cli): pending list and retry visibility The auto-capture hook silently writes failed classifier results to pending/<id>.json. Until now they sat there forever — the user had no way to see what was queued or to flush them. Two new subcommands: task-journal pending list task-journal pending retry [--mock-event-type X ...] list prints id / queued_at / attempts / text-preview as a plain table; --json deferred to a future epic if anyone asks. retry walks the queue and re-feeds each entry through the classifier (currently only the mock path is wired — the real classifier roundtrip lives behind the install-hooks integration). Schema adds an optional attempts counter; once it hits PENDING_MAX_ATTEMPTS (=3) the entry is renamed to <id>.dead.json so list still surfaces it but retry skips it. Tests: - pending_list_shows_queued_entries - pending_retry_drains_with_mock_classifier (round-trips a fake queued entry into a real event in JSONL, visible in pack) - pending_retry_marks_dead_after_max_attempts Refs claude-memory-yj1.3 * test(mcp): rmcp client + transport compile-and-shape integration test Adds a real rmcp client integration test that verifies three boundary contracts: - rmcp 0.3 with the client feature compiles against this workspace and the pinned toolchain. - CallToolRequestParam round-trips through serde — the JSON-RPC envelope shape is the same shape we marshal in tj-cli tests. - tokio::io::DuplexStream still satisfies the AsyncRead + AsyncWrite + Send + static bounds rmcp expects from a transport. A previous draft of this test span an in-process server + client over duplex and called task_create + event_add + task_pack + task_close end-to-end. That draft hung indefinitely because TaskJournalServer is defined in the binary crate (main.rs) and is not reachable from a black-box integration test. Driving the real handlers needs the server moved into a library target — tracked as a follow-up. Until then the CLI integration tests in tj-cli/tests/cli.rs cover the same code paths end-to-end through the same tj_core entry points the MCP handlers use. Refs claude-memory-yj1.4 * feat(mcp): structured tracing with correlation_id per tool call Today the MCP server emits no per-call telemetry — when a user reports slowness or a stuck tool, the only signal is whatever the client surfaces. Adds two INFO log lines around every handler: tool_call start tool=task_pack correlation_id=01J... tool_call ok tool=task_pack correlation_id=01J... elapsed_ms=18 (The correlation_id is the same across both lines, so a grep on correlation_id=01J... isolates one client request.) Choice notes: - traced_tool helper wraps the existing async-fn body so the tool macro signature stays exactly the same. No tool_router re-derivation. - ULID instead of UUID v4: ULID is already a transitive dep (used for event_id), and the embedded timestamp orders log lines naturally without parsing a separate field. - On error the exit line drops to WARN level and includes the McpError.message so the failure cause shows up at default RUST_LOG=info without enabling debug noise. Tests: - new_correlation_id_is_unique_across_thousand_calls - traced_tool_transparently_returns_inner_result (Ok + Err paths preserve the inner Result) Refs claude-memory-yj1.5 * feat(mcp): graceful SIGTERM and Ctrl-C shutdown Today the MCP server runs the rmcp serve loop until the transport closes, then exits. SIGTERM (e.g. from supervisord, systemd, docker stop) hard-kills the process mid-write — JSONL log can be left mid-line, tracing buffers are dropped, no shutdown ack ever lands in the supervisor logs. Now: main wraps the serve loop in tokio::select! against a new wait_for_shutdown_signal() future: - On Unix: races Ctrl-C and SIGTERM, logs which one arrived. - On Windows: only Ctrl-C / Ctrl-Break is observable to a console binary; SIGTERM has no analogue, so we log only Ctrl-C. Either branch logs an info line and returns 0. The drop of the tokio runtime flushes tracing buffers as a side effect. Adds the tokio signal feature to the workspace deps. Test shutdown_signal_does_not_fire_spuriously races the shutdown future against an immediately-ready future and asserts the ready arm wins — i.e. nothing fires until a real signal arrives. Refs claude-memory-yj1.6 * release: bump workspace version to 0.2.1 Last commit of epic D. Workspace 0.2.0-rc.1 -> 0.2.1; tj-cli and tj-mcp tj-core deps aligned. CHANGELOG gets a [0.2.1] section listing the additive features (export sqlite, pending list/retry, correlation_id tracing, graceful shutdown) and the internal Connection cache perf change. No breaking changes; this is a minor bump after 0.2.0 (the rc). After dogfooding I will tag and publish. Closes claude-memory-yj1.7 * docs: epic D PR body for review
1 parent 8c49785 commit 8169930

44 files changed

Lines changed: 4465 additions & 408 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
# Epic A — v0.1.4 hardening
2+
3+
**Date:** 2026-05-06
4+
**Branch:** `claude/youthful-shaw-b96d78`
5+
**Target release:** `0.1.4` (backwards-compatible patch)
6+
**Bd epic:** see `bd list --type epic` (assigned id at runtime)
7+
8+
## Goal
9+
10+
Ship a backwards-compatible patch that closes the most acute correctness, robustness, and OSS-hygiene gaps identified in the 2026-05-06 audit, **without breaking the public CLI/MCP contract**. Anything that requires a breaking change is deferred to Epic B (v0.2.0).
11+
12+
## Success criteria
13+
14+
1. `cargo test --workspace --all-targets` green on `ubuntu-latest`, `macos-latest`, `windows-latest`.
15+
2. `cargo audit` runs in CI and is clean (or vulnerabilities are accepted with documented reason).
16+
3. `cargo clippy --workspace --all-targets -- -D warnings` clean.
17+
4. `cargo doc --workspace --no-deps` clean with `RUSTDOCFLAGS=-D warnings`.
18+
5. New CI job pins MSRV (currently `1.83`) and verifies build.
19+
6. `CHANGELOG.md` exists and documents `0.1.4` entry.
20+
7. No removed/renamed CLI flags. No removed MCP tools or required parameters.
21+
8. Branch `claude/youthful-shaw-b96d78` pushed; PR opened against `main`.
22+
23+
## Out of scope (deferred)
24+
25+
- Incremental indexing / pack-cache fix (Epic B — perf)
26+
- MCP error contract redesign (Epic B — breaking)
27+
- `--project-dir` argument for MCP (Epic B)
28+
- Migrations framework (Epic B — coupled with incremental indexing)
29+
- Few-shot prompting + eval datasets (Epic C — quality)
30+
- `task-journal doctor`, `migrate-project` (Epic C — DX)
31+
32+
## Tasks (11)
33+
34+
Each task is one atomic commit. Test-first when behavior changes; doc/CI-only tasks may skip the failing-test step.
35+
36+
| # | Task | Touches | Test? | Notes |
37+
|---|------|---------|-------|-------|
38+
| A1 | HTTP timeout for `AnthropicClassifier` | `classifier/http.rs` | yes (mockito slow-server) | 15s connect+read timeout. Hardcoded — env-var override deferred. |
39+
| A2 | Graceful skip of malformed JSONL lines in `rebuild_state` | `db.rs` | yes (jsonl with bad line) | Log a `tracing::warn!` and continue; total parsed count returned. |
40+
| A3 | Classifier model overridable via env var | `classifier/http.rs`, `classifier/cli.rs` | yes (env unset → default; env set → override) | `TJ_CLASSIFIER_MODEL`; default unchanged. |
41+
| A4 | Extend task_id from 6 → 10 characters | `crates/tj-mcp/src/main.rs`, `crates/tj-cli/src/main.rs` | yes (collision-free over 10k synthetic ids) | Old 6-char ids remain valid (string compare). |
42+
| A5 | Remove `stub: bool` from MCP responses | `crates/tj-mcp/src/main.rs`, smoke tests | yes (smoke test asserts no `stub` field) | Field removal — but no client read it; documented in CHANGELOG. |
43+
| A6 | Centralize `SCHEMA_VERSION` const | `tj-core/src/lib.rs`, `pack.rs`, `tj-mcp/src/main.rs` | yes (single source) | `pub const SCHEMA_VERSION: &str = "1.0";` |
44+
| A7 | `CHANGELOG.md` with Keep-a-Changelog format | new file | n/a | Backfill `0.1.0``0.1.3` from `git log`. |
45+
| A8 | `cargo-audit` job in CI | `.github/workflows/ci.yml` | n/a | Non-blocking initially; flips to blocking once green. |
46+
| A9 | MSRV job in CI (`rust-version` = 1.83) | `.github/workflows/ci.yml` | n/a | Uses `dtolnay/rust-toolchain@1.83`. |
47+
| A10 | `.editorconfig` | new file | n/a | LF, UTF-8, 4-space rust, 2-space yaml. |
48+
| A11 | File-lock on JSONL append | `tj-core/src/storage.rs`, `Cargo.toml` | yes (two-writer race test) | Crate: `fd-lock` (cross-platform). Blocking lock. |
49+
50+
## Sequencing
51+
52+
```
53+
A6 ──┐
54+
A1 ──┼─→ A7 (CHANGELOG references all done work)
55+
A2 ──┤
56+
A3 ──┤
57+
A4 ──┤
58+
A5 ──┤
59+
A10 ─┤
60+
A8 ──┤
61+
A9 ──┘
62+
A11 last (fd-lock dep + race test)
63+
```
64+
65+
A11 last because it adds a runtime dependency and a flaky-prone test; everything else lands first so green CI is the baseline before introducing the lock.
66+
67+
## Risks
68+
69+
- **A4 task_id length change:** new ids longer; nothing reads fixed-width. Verified by smart_read of CLI/MCP code paths.
70+
- **A5 `stub` removal:** technically a schema change, but `stub` was always false post-Phase-1. Documented as non-breaking in CHANGELOG; if any downstream tool actually reads it, we revert in 0.1.5.
71+
- **A11 fd-lock on Windows:** `fd-lock` uses `LockFileEx` on Windows; behavior differs from Linux `flock`. Test must cover both.
72+
- **A2 swallowing real corruption:** mitigation — log at `warn!` level with line number and parse error.
73+
74+
## Verification (per task)
75+
76+
1. `cargo fmt --all --check`
77+
2. `cargo clippy --workspace --all-targets -- -D warnings`
78+
3. `cargo test --workspace --all-targets` (specific test for the touched module)
79+
4. `git diff --stat` reviewed (no unintended line-ending or whitespace flips)
80+
5. Commit with conventional-commit prefix (`fix:`, `chore:`, `docs:`, `ci:`, `feat:`)
81+
6. `bd update <id> --status closed --reason "<one-line>"`
82+
83+
## Final verification (epic-level)
84+
85+
- `cargo test --workspace --all-targets` green
86+
- `cargo audit` clean
87+
- `bd list --parent <epic-id> --status open` returns empty
88+
- `git log --oneline 8c49785..HEAD` matches the 11 tasks 1:1
89+
- `gh pr create` opened against `main` with the CHANGELOG entry as body
Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
## Summary
2+
3+
Epic C — quality / DX / community polish. **8 atomic commits** on `claude/v0.2.0-epic-c`, built off `claude/v0.2.0-epic-b` HEAD.
4+
5+
> **Merge order:** epic A → main, then epic B (rebased on main), then this branch (rebased on main).
6+
7+
Plan: [`.docs/plans/2026-05-06-v0.2.0-epic-c-quality.md`](./.docs/plans/2026-05-06-v0.2.0-epic-c-quality.md)
8+
9+
### What changed
10+
11+
**Classifier quality**
12+
- `feat(classifier)` — six few-shot Input/Output examples in the prompt covering the harder boundary calls (hypothesis vs finding, finding vs evidence, decision vs hypothesis). Prompt-budget guard still passes.
13+
- `test(classifier)` — 30-row labeled eval fixture + opt-in accuracy gate (`TJ_CLASSIFIER_EVAL=on`). Default mode runs hermetic shape tests; opt-in mode calls `ClaudeCliClassifier::default()` and asserts ≥ 0.70 accuracy. Floor will ratchet up after 100+ dogfood examples.
14+
15+
**User-facing DX**
16+
- `feat(cli)``task-journal doctor` self-check command with human + `--json` output. Reports claude-on-PATH, data-dir writability, known projects, schema migrations.
17+
- `feat(cli)``task-journal migrate-project --from PATH --to PATH [--force]`. Renames JSONL/SQLite/metrics from old project_hash to new; UPDATEs `tasks.project_hash` and `index_state.project_hash` columns in SQLite.
18+
- `feat(export)``export --format html` produces a self-contained timeline page (inline CSS, no external assets, dark-mode aware via `prefers-color-scheme`).
19+
20+
**OSS / coverage / Windows**
21+
- `chore``CONTRIBUTING.md`, `CODE_OF_CONDUCT.md` (Contributor Covenant 2.1 ref), three `.github/ISSUE_TEMPLATE/*`, `.github/PULL_REQUEST_TEMPLATE.md`, README links.
22+
- `ci``cargo-llvm-cov` job + Codecov upload + README badge. Non-blocking initially.
23+
- `test(classifier)` — cross-platform fake-claude shim (`.sh`/`cat` on Unix, `.cmd`/`type` on Windows). The two `ClaudeCliClassifier` tests now run on all three CI matrix OS instead of `cfg(unix)` only.
24+
25+
### Verification
26+
27+
- `cargo fmt --all -- --check`
28+
- `cargo clippy --workspace --all-targets -- -D warnings`
29+
- `cargo test --workspace --all-targets` ✅ — **202 tests** (was 193 from epic B; +9 added by this PR)
30+
- `cargo bench --workspace --no-run`
31+
32+
### New CLI surface
33+
34+
| Command | Purpose |
35+
|---------|---------|
36+
| `task-journal doctor [--json]` | Diagnostic check; non-zero exit on issues |
37+
| `task-journal migrate-project --from PATH --to PATH [--force]` | Re-key on-disk data when project moves |
38+
| `task-journal export --format html [--task ID]` | Self-contained HTML timeline |
39+
40+
### New env vars
41+
42+
| Var | Effect |
43+
|-----|--------|
44+
| `TJ_CLASSIFIER_EVAL=on` | Enables the real-classifier accuracy run in `cargo test`. Default OFF — CI stays hermetic. |
45+
46+
### Test plan
47+
48+
- [ ] Branch CI green on three OS for `test`, `msrv`, `audit`, `benches-compile`, `coverage` (new).
49+
- [ ] Try `task-journal doctor` on a clean VM — confirms claude-binary detection and dir-writability checks.
50+
- [ ] Move a project on disk, run `migrate-project`, confirm `task_pack` works in the new location.
51+
- [ ] `task-journal export --format html --task tj-X > timeline.html` and open in browser; verify dark mode + no broken layout.
52+
- [ ] (Optional, manual) `TJ_CLASSIFIER_EVAL=on cargo test classifier_meets_accuracy_floor` against the real `claude` CLI; record baseline accuracy.
53+
54+
### After this lands
55+
56+
`v0.2.0` final tag. No further code changes expected — the dogfood window from `0.2.0-rc.1` already exercised epic B; this epic is additive and behind-the-scenes for almost every existing user.
57+
58+
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
# Epic C — v0.2.0: quality, DX, community polish
2+
3+
**Date:** 2026-05-06
4+
**Branch:** `claude/v0.2.0-epic-c` (off `claude/v0.2.0-epic-b` HEAD)
5+
**Target release:** `0.2.0` final (after epic B's `rc.1` is dogfooded and this PR merges)
6+
**Bd epic:** `claude-memory-1yc`
7+
8+
## Goal
9+
10+
Three thematic threads, deliberately bundled because none alone deserves a major version but together they raise the project from "works" to "feels finished":
11+
12+
1. **Classifier quality** — make the auto-capture hook actually trust-worthy with few-shot prompting and a regression-gated accuracy floor.
13+
2. **User-facing DX**`doctor` (diagnostic), `migrate-project` (path moved), HTML timeline (PR review).
14+
3. **Community / coverage / Windows** — OSS hygiene files, llvm-cov badge, Windows test parity for the CLI classifier.
15+
16+
## Success criteria
17+
18+
1. `cargo test --workspace --all-targets` green on three OS — including the previously-skipped `cfg(unix)`-only classifier tests.
19+
2. New `tests/classifier_eval.rs` runs against a checked-in labeled dataset and enforces an accuracy floor; CI fails when the floor is broken.
20+
3. `task-journal doctor` exits 0 on a healthy install and emits a machine-readable summary that flags missing `claude` CLI / unwritable data dirs / unknown migrations.
21+
4. `task-journal migrate-project --from <old> --to <new>` re-keys the JSONL + SQLite + metrics for the new project hash; round-trips through `task_pack`.
22+
5. `task-journal export --format html --task <id>` emits a self-contained HTML timeline.
23+
6. Coverage report: `cargo llvm-cov --workspace` runs in CI and uploads to Codecov; README badge reflects the status.
24+
7. `CONTRIBUTING.md`, `CODE_OF_CONDUCT.md`, `.github/ISSUE_TEMPLATE/*`, `.github/PULL_REQUEST_TEMPLATE.md` exist and link from README.
25+
8. PR opened against `main` (after `0.2.0-rc.1` is in main).
26+
27+
## Non-goals (deferred)
28+
29+
- Opt-in telemetry endpoint (requires hosted backend — separate decision).
30+
- C/C++/server-side LSP integration.
31+
- Multi-language classifier prompts.
32+
33+
## Tasks (8)
34+
35+
| # | Task | Touches | Test? | Notes |
36+
|---|------|---------|-------|-------|
37+
| C1 | OSS hygiene files: `CONTRIBUTING.md`, `CODE_OF_CONDUCT.md`, issue + PR templates | repo root + `.github/` | n/a | Standard OSS scaffolding; not blocking other work. |
38+
| C2 | `cargo-llvm-cov` job in CI + Codecov upload + README badge | `.github/workflows/ci.yml`, `README.md` | n/a | Non-blocking initially; flip threshold to blocking after 5 baselines. |
39+
| C3 | Windows-compatible tests for `ClaudeCliClassifier` (currently `cfg(all(test, unix))`) | `crates/tj-core/src/classifier/cli.rs` | yes (port the two existing fake-claude tests to use `.cmd`/`.bat` shim on Windows) | Closes the platform gap noticed in the audit. |
40+
| C4 | `task-journal doctor` command | `tj-cli/src/main.rs`, possibly small `tj-core::diagnostics` mod | yes (CLI integration test) | Checks: claude bin in PATH, data dirs writable, schema_migrations matches expected, last_indexed_event_id consistent. |
41+
| C5 | `task-journal migrate-project --from <path> --to <path>` | `tj-cli/src/main.rs`, `tj-core::project_hash`, fs ops | yes | Renames `<old_hash>.jsonl`, `<old_hash>.sqlite`, `<old_hash>.jsonl` in metrics, etc. |
42+
| C6 | `task-journal export --format html [--task <id>]` | `tj-cli/src/main.rs` (existing `export` command), new tiny `html_timeline` helper | yes | Self-contained: inline CSS, no external assets. |
43+
| C7 | Few-shot prompting in classifier | `tj-core/src/classifier/prompt.rs` | yes (prompt contains 6 examples; size still bounded < 64KB) | 2 examples per harder pair: hypothesis vs finding, finding vs evidence, decision vs hypothesis. |
44+
| C8 | Classifier eval dataset + accuracy gate | `tj-core/tests/classifier_eval.rs`, `tj-core/tests/fixtures/classifier_eval.jsonl` | yes (eval test enforces ≥ 70% baseline) | Hand-label ~30 chunks; uses `MockClassifier` + golden expected outputs to keep deterministic; real-classifier path stays opt-in via env var so CI does not need API access. |
45+
46+
## Sequencing
47+
48+
```
49+
C1 ─┐
50+
C2 ─┤
51+
C3 ─┼─→ (independent)
52+
C4 ─┤
53+
C5 ─┤
54+
C6 ─┘
55+
56+
C7 ─→ C8 (eval validates the new prompt against the dataset)
57+
```
58+
59+
C1/C2/C3 can land in any order. C4/C5/C6 are independent CLI features. C7 unlocks C8 (the eval dataset is the way to *measure* that few-shot improved precision rather than degraded it).
60+
61+
## Risks
62+
63+
- **C7 prompt regression:** few-shot can over-fit examples and degrade on out-of-distribution chunks. Mitigation: eval set in C8 covers boundary cases (`hypothesis-not-finding`, etc).
64+
- **C8 false confidence:** ≥70% on 30 examples is a noisy estimate. Mitigation: ratchet floor up only after collecting 100+ labeled examples in dogfooding.
65+
- **C5 destructive migration:** if `--from` and `--to` resolve to the same hash (symlink, case-insensitive FS), we'd corrupt data. Mitigation: refuse when `from_hash == to_hash`; require `--force` to overwrite an existing destination.
66+
- **C3 Windows shim:** rewriting the fake-claude test in PowerShell vs `.cmd` vs Python — pick `.cmd` for minimal surface; some Windows tests skip on lack of `cmd.exe` is acceptable.
67+
68+
## Verification gate (per task)
69+
70+
Same as Epic A/B:
71+
1. `cargo fmt --all -- --check`
72+
2. `cargo clippy --workspace --all-targets -- -D warnings`
73+
3. `cargo test --workspace --all-targets`
74+
4. `git diff --stat` review
75+
5. Conventional-commit prefix
76+
6. `bd close <id> --reason "..."`
77+
78+
## Final verification (epic-level)
79+
80+
- All 8 sub-tasks closed in bd
81+
- `cargo bench --workspace --no-run` clean
82+
- `cargo llvm-cov --workspace --summary-only` reports a number
83+
- `task-journal doctor` runs locally and prints the diagnostics
84+
- PR body lists which features changed user-facing CLI surface

0 commit comments

Comments
 (0)