Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
d3470a4
feat(014): Planner research guards + Tasker per-round inspection capture
jeremymanning May 21, 2026
6d275d6
feat(014): scripts/validate_phase4.py driver (preflight/reset/run/ver…
jeremymanning May 21, 2026
5b1db6e
test(014): Phase-4 regression + schema tests; self-guard destructive …
jeremymanning May 21, 2026
54b6246
docs(014): mark T003-T011,T016-T023,T025-T028 done in tasks.md
jeremymanning May 21, 2026
81c1dfc
fix(audit): body-density no longer mis-flags table/diagram-heavy arti…
jeremymanning May 21, 2026
b76bc88
fix(014): FR-007 = robust structural check, not fragile 1:1 name match
jeremymanning May 21, 2026
3c5f75f
fix(audit): don't learn structural task labels ([US1],[Story]) as tem…
jeremymanning May 21, 2026
f465f6c
feat(014): add --force re-validation rollback to validate_phase4 driver
jeremymanning May 21, 2026
b1fef59
fix(audit): bracket-density rule ignores fenced/diagram/link content
jeremymanning May 21, 2026
1743bc4
fix(014): step driver to 'analyzed'; align spec to best-effort cap-hit
jeremymanning May 21, 2026
ae193f7
fix(014): raise per-step timeout to 3600s; allow resume from mid-Phas…
jeremymanning May 21, 2026
e1197dd
fix(014): FR-006 URL extraction strips wrapping backticks (false-404 …
jeremymanning May 21, 2026
9dfa32b
fix(PROJ-262): correct dead QM9 dataset DOI in Phase-3 spec.md
jeremymanning May 21, 2026
b87da27
design(dataset-resolver): deterministic web-search dataset resolution…
jeremymanning May 22, 2026
a0f2e4b
plan(dataset-resolver): bite-sized TDD implementation plan
jeremymanning May 22, 2026
60c75fb
feat(dataset-resolver): DatasetCandidate + HuggingFace Hub source
jeremymanning May 22, 2026
8a19fdb
feat(dataset-resolver): figshare/Zenodo/DataCite sources
jeremymanning May 22, 2026
6671a90
feat(dataset-resolver): sample-stream format sniff
jeremymanning May 22, 2026
8c9892f
feat(dataset-resolver): verify_candidate (reachability + sniff, reuse…
jeremymanning May 22, 2026
11f4a72
feat(dataset-resolver): intent extraction + resolve_datasets top-N or…
jeremymanning May 22, 2026
dfc1431
feat(dataset-resolver): manifest write + planner block + unresolved h…
jeremymanning May 22, 2026
67d957a
fix(dataset-resolver): xyz/sdf/tar sniffers + granular candidates_tri…
jeremymanning May 22, 2026
7ecde90
test(dataset-resolver): de-vacuify DataCite test with a real DataCite…
jeremymanning May 22, 2026
4b33b33
feat(dataset-resolver): wire resolver into Planner (inject verified U…
jeremymanning May 22, 2026
650ee98
fix(planner): remove contradictory dataset-substitution rule (NAB URL)
jeremymanning May 22, 2026
bfddeea
fix(dataset-resolver): store stable URL, not expiring presigned redir…
jeremymanning May 22, 2026
07984c2
fix(planner): strip wrapping/stray markdown code fences from multi-fi…
jeremymanning May 22, 2026
c3aba10
fix(tasker): FR-012 guard - refuse Mode-B spec.md patch that deletes …
jeremymanning May 22, 2026
6f148ff
fix(audit): bracket-density counts only multi-word placeholders
jeremymanning May 22, 2026
db94303
validate(014): both canonicals reach analyzed via dataset resolver; w…
jeremymanning May 22, 2026
0147f8a
chore(014): commit Phase-4 validation run-log entries (FR-014 audit t…
jeremymanning May 22, 2026
40f0b06
fix: publish_blocked schema gap (publisher crash) + stale spec-012 sc…
jeremymanning May 22, 2026
e73fbdf
fix(submission-intake): VALID_FIELDS reuses canonical LIBRARIAN_DEFAU…
jeremymanning May 22, 2026
cfa6e76
chore: run-log entries from final regression runs
jeremymanning May 22, 2026
73653a7
test(phase1): make citation-resolver timeout test deterministic (drop…
jeremymanning May 22, 2026
ee9672a
test(013): accept HTTP 202 from just-minted Zenodo sandbox DOIs
jeremymanning May 22, 2026
9231cd6
fix(backends): bound LLM calls with a daemon-thread deadline (fixes 5…
jeremymanning May 22, 2026
01dddb3
Merge remote-tracking branch 'origin/main' into 014-phase4-plan-tasks…
jeremymanning May 27, 2026
4ac9ffd
fix(backends): enforce free-only Dartmouth models; correct registry m…
jeremymanning May 27, 2026
b273c8d
test(implementer-e2e): raise SC-001 wall-clock budget 1200s -> 2400s
jeremymanning May 27, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .specify/feature.json
Original file line number Diff line number Diff line change
@@ -1 +1 @@
{"feature_directory": "specs/013-paper-revision-implementer"}
{"feature_directory": "specs/014-phase4-plan-tasks-testing"}
2 changes: 1 addition & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,5 +70,5 @@ Since this is primarily a research documentation repository without traditional
<!-- SPECKIT START -->
For additional context about technologies to be used, project structure,
shell commands, and other important information, read the current plan:
[specs/013-paper-revision-implementer/plan.md](specs/013-paper-revision-implementer/plan.md).
[specs/014-phase4-plan-tasks-testing/plan.md](specs/014-phase4-plan-tasks-testing/plan.md).
<!-- SPECKIT END -->
25 changes: 13 additions & 12 deletions agents/prompts/planner.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,19 +57,20 @@ $schema: ...
- For computational projects, `contracts/` MUST include at least one
schema (e.g., dataset schema, output schema) that the
Implementer's tests can validate against.
- NEVER invent URLs or citations. If the spec/idea has cited URLs,
copy them verbatim; do not add new ones, do not fabricate
`(verified YYYY-MM-DD)` annotations. The Reference-Validator
fetches every cited URL — fabricated URLs flip the verdict to
mismatch.
- For dataset/code/paper references in research.md, cite ONLY the URLs listed in
the "# Verified datasets" block of the user message (these have been
web-searched and reachability/format-verified for you). NEVER invent or guess
a dataset URL. If the block says a dataset has NO verified source, describe the
dataset by name but do NOT fabricate a URL.
- For DATASETS specifically: `research.md`'s "Dataset Strategy"
table MUST name only real, programmatically-fetchable sources.
If the spec calls for "UCI Electricity" but the canonical UCI
endpoint requires browser navigation, plan for the `ucimlrepo`
Python package OR substitute a comparable open dataset that has
a known-stable raw URL (e.g., NAB benchmark CSVs at
`https://raw.githubusercontent.com/numenta/NAB/master/data/realKnownCause/`,
or HuggingFace `datasets.load_dataset(...)`).
table MUST reference ONLY the sources in the "# Verified datasets"
block above — cite each dataset by its verified URL, or load that
SAME dataset via a well-known programmatic loader (e.g.
`datasets.load_dataset(...)` for a verified HuggingFace dataset, or
`ucimlrepo` for a UCI dataset). Do NOT substitute a different dataset
and do NOT invent or guess a raw URL. If a dataset the spec needs has
NO verified source in the block, state that explicitly rather than
fabricating one.
- For COMPUTATIONAL TASK ORDERING: the plan MUST order phases so
data is downloaded BEFORE any task that consumes it, models are
fitted BEFORE any task that evaluates them, and figures are
Expand Down
16 changes: 8 additions & 8 deletions agents/registry.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ agents:
fallback_backends:
- huggingface
- local
default_model: google.gemma-3-27b-it
default_model: google.gemma-4-31B-it
wall_clock_budget_seconds: 300
paid_opt_in: false
- name: flesh_out
Expand Down Expand Up @@ -218,7 +218,7 @@ agents:
fallback_backends:
- huggingface
- local
default_model: google.gemma-3-27b-it
default_model: google.gemma-4-31B-it
tools:
- citation_fetcher
wall_clock_budget_seconds: 300
Expand Down Expand Up @@ -316,7 +316,7 @@ agents:
fallback_backends:
- huggingface
- local
default_model: google.gemma-3-27b-it
default_model: qwen.qwen3.5-122b
wall_clock_budget_seconds: 300
paid_opt_in: false
- name: paper_writing
Expand Down Expand Up @@ -399,7 +399,7 @@ agents:
fallback_backends:
- huggingface
- local
default_model: google.gemma-3-27b-it
default_model: google.gemma-4-31B-it
wall_clock_budget_seconds: 600
paid_opt_in: false
- name: latex_fix
Expand Down Expand Up @@ -445,7 +445,7 @@ agents:
fallback_backends:
- huggingface
- local
default_model: google.gemma-3-27b-it
default_model: google.gemma-4-31B-it
wall_clock_budget_seconds: 300
paid_opt_in: false
- name: repository_hygiene
Expand All @@ -461,7 +461,7 @@ agents:
fallback_backends:
- huggingface
- local
default_model: google.gemma-3-27b-it
default_model: google.gemma-4-31B-it
wall_clock_budget_seconds: 300
paid_opt_in: false
- name: task_atomizer
Expand Down Expand Up @@ -496,7 +496,7 @@ agents:
fallback_backends:
- huggingface
- local
default_model: google.gemma-3-27b-it
default_model: google.gemma-4-31B-it
wall_clock_budget_seconds: 300
paid_opt_in: false
- name: paper_reviewer_writing_quality
Expand Down Expand Up @@ -818,7 +818,7 @@ agents:
fallback_backends:
- huggingface
- local
default_model: google.gemma-3-27b-it
default_model: google.gemma-4-31B-it
tools: []
wall_clock_budget_seconds: 300
paid_opt_in: false
Expand Down
Loading
Loading