Skip to content

Commit df968f0

Browse files
Merge pull request #250 from ContextLab/015-pipeline-convergence-protocol
spec 015: pipeline convergence protocol (closes #239)
2 parents 790f2e0 + 18d282e commit df968f0

694 files changed

Lines changed: 64340 additions & 3873 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/llmxive-pipeline.yml

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,3 +78,21 @@ jobs:
7878
if [[ -n "${PROJECT_ID:-}" ]]; then ARGS+=(--project "$PROJECT_ID"); fi
7979
if [[ -n "${STAGE:-}" ]]; then ARGS+=(--stage "$STAGE"); fi
8080
python -m llmxive "${ARGS[@]}"
81+
- name: Persist pipeline progress
82+
# Commit + push whatever the pipeline produced (advanced stage, new
83+
# artifacts, run-log telemetry) back to the branch this run checked out,
84+
# so progress survives the ephemeral runner. always(): even a transient
85+
# endpoint failure leaves the project parked at its stage (no partial
86+
# artifacts — the stage guards unlink those) and we still keep the
87+
# run-log. [skip ci] on the message avoids retriggering the pipeline.
88+
if: always()
89+
run: |
90+
git config user.name "llmxive-pipeline-bot"
91+
git config user.email "noreply@anthropic.com"
92+
git add -A
93+
if git diff --cached --quiet; then
94+
echo "no pipeline changes to commit"
95+
else
96+
git commit -m "chore(pipeline): persist run progress [skip ci]"
97+
git push origin "HEAD:${GITHUB_REF_NAME}"
98+
fi

.github/workflows/spec015-calibration.yml

Lines changed: 49 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -35,9 +35,9 @@ on:
3535
default: '(unspecified)'
3636
type: string
3737
max_tokens:
38-
description: 'Per-call max_tokens for the reasoning model (default 8192)'
38+
description: 'Per-call max_tokens for the reasoning model (default 131072 = 128K; qwen3.5-122b has a 256K context window so this leaves ample room for input + reasoning)'
3939
required: false
40-
default: '8192'
40+
default: '131072'
4141
type: string
4242
# Uncomment to run weekly once the workflow is trusted:
4343
# schedule:
@@ -83,6 +83,18 @@ jobs:
8383
--max-tokens "$MAX_TOKENS" \
8484
2>&1 | tee calibration-run.log
8585
86+
# Upload the produced report (+ run log) as an artifact BEFORE
87+
# attempting any git commit. Calibration runs are expensive (~25 min);
88+
# a race-condition push failure shouldn't lose the output.
89+
- name: Upload calibration outputs as artifact
90+
if: always()
91+
uses: actions/upload-artifact@v4
92+
with:
93+
name: spec015-calibration-output
94+
path: |
95+
calibration-run.log
96+
specs/015-pipeline-convergence-protocol/calibration/reports/
97+
8698
- name: Commit + push the report
8799
if: always()
88100
env:
@@ -96,26 +108,42 @@ jobs:
96108
calibration-run.log || true
97109
if git diff --cached --quiet; then
98110
echo "No new report to commit."
99-
else
100-
TIMESTAMP="$(date -u +%Y%m%dT%H%M%SZ)"
101-
git commit -m "calib(015): ${STAGE} run (${TIMESTAMP}) (#239)
111+
exit 0
112+
fi
113+
TIMESTAMP="$(date -u +%Y%m%dT%H%M%SZ)"
114+
git commit -m "calib(015): ${STAGE} run (${TIMESTAMP}) (#239)
102115
103-
Triggered via workflow_dispatch with:
104-
stage=${STAGE}
105-
domain=${DOMAIN}
106-
max_tokens=${MAX_TOKENS}
116+
Triggered via workflow_dispatch with:
117+
stage=${STAGE}
118+
domain=${DOMAIN}
119+
max_tokens=${MAX_TOKENS}
107120
108-
Maintainer: review the produced report under
109-
specs/015-pipeline-convergence-protocol/calibration/reports/
110-
and fill in the adjudication checklist per FR-046.
121+
Maintainer: review the produced report under
122+
specs/015-pipeline-convergence-protocol/calibration/reports/
123+
and fill in the adjudication checklist per FR-046.
111124
112-
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>"
113-
git push
114-
fi
125+
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>"
115126
116-
- name: Upload run log as artifact
117-
if: always()
118-
uses: actions/upload-artifact@v4
119-
with:
120-
name: calibration-run-log
121-
path: calibration-run.log
127+
# Race-condition handling: the calibration step takes ~25 min,
128+
# during which other commits may have landed on the branch. Pull
129+
# --rebase to replay our single commit on top, then push. Retry
130+
# up to 3 times in case multiple concurrent runs are competing.
131+
BRANCH="${GITHUB_REF##*/}"
132+
for attempt in 1 2 3; do
133+
echo "::group::Push attempt ${attempt}"
134+
git fetch origin "${BRANCH}"
135+
if git pull --rebase origin "${BRANCH}"; then
136+
if git push origin "HEAD:${BRANCH}"; then
137+
echo "::endgroup::"
138+
echo "Pushed on attempt ${attempt}."
139+
exit 0
140+
fi
141+
fi
142+
echo "::endgroup::"
143+
echo "Attempt ${attempt} failed; sleeping before retry."
144+
sleep $((attempt * 5))
145+
done
146+
echo "::error::Could not push the calibration report after 3 attempts."
147+
echo "The report artifact has been uploaded above; download from"
148+
echo "the workflow's Artifacts section."
149+
exit 1

.gitignore

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -243,6 +243,7 @@ Temporary Items
243243
# transient in-progress sentinels and any local runtime caches.
244244
state/run-log/*/in-progress/
245245
state/run-log/*/.invalid/
246+
state/grounding-cache/
246247
.specify/cache/
247248

248249
# Multi-secret env variants used by Dartmouth + HF
@@ -296,3 +297,8 @@ state/audit/pdf/*/screenshots/
296297
# demand keyed by sha256 of chunk bytes.
297298
projects/*/paper/.chunk_summaries/
298299

300+
301+
# Local agent/runtime state (not part of the repo)
302+
.omc/
303+
.summaries/
304+
.claude/scheduled_tasks.lock
Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
# Feature Specification: Quantifying the Complexity of Knot Diagrams via Crossing Number and Braid Index
2+
3+
**Feature Branch**: `001-knot-complexity-analysis`
4+
**Created**: 2026-05-29
5+
**Status**: Draft
6+
**Input**: User description: "Quantifying the Complexity of Knot Diagrams via Crossing Number and Braid Index"
7+
8+
**Research Question (Phase 1)**: How does the relationship between crossing number and braid index vary across prime knots with crossing number ≤13, and what patterns emerge when stratifying by alternating/non-alternating classification?
9+
10+
**Scope Boundary (Phase 1)**: This spec implements analysis stratified by alternating/non-alternating classification only. Multi-class prime knot exploration (torus, satellite, hyperbolic) is deferred to Phase 2+ as documented in Assumptions. This scope boundary is the implementation default for this iteration.
11+
12+
**Validation Scope (Phase 1)**: Dataset completeness validation focuses on crossing numbers ≤10 as the Phase 1 benchmarking scope. Data collection covers all knots with crossing number ≤13, but full validation across all crossing numbers ≤13 is deferred to future iterations. This is a deliberate scope decision for practical verification purposes in exploratory analysis.
13+
14+
**Multi-Phase Framing**: The project is structured as a multi-phase research program. Phase 1 establishes foundational analysis on alternating/non-alternating dichotomy. Phase 2+ will incorporate additional knot classes (torus, satellite, hyperbolic) as data extraction pipelines and classification logic are developed. This phased approach ensures incremental validation.
15+
16+
## User Scenarios & Testing *(mandatory)*
17+
18+
<!--
19+
IMPORTANT: User stories should be PRIORITIZED as user journeys ordered by importance.
20+
Each user story/journey must be INDEPENDLY TESTABLE - meaning if you implement just ONE of them,
21+
you should still have a viable MVP (Minimum Viable Product) that delivers value.
22+
23+
Assign priorities (P1, P2, P3, etc.) to each story, where P1 is the most critical.
24+
Think of each story as a standalone slice of functionality that can be:
25+
- Developed independently
26+
- Tested independently
27+
- Deployed independently
28+
- Demonstrated to users independently
29+
-->
30+
31+
### User Story 1 - Download and Parse Knot Data from Knot Atlas (Priority: P1)
32+
33+
As a researcher, I need to download knot data from Knot Atlas including crossing numbers, braid indices, and prime knot classifications for all prime knots with crossing number ≤13 so that I have a testable dataset for correlation analysis.
34+
35+
**Why this priority**: This is the foundational step without which no analysis can proceed. The dataset quality and completeness directly determines the validity of all downstream findings.
36+
37+
**Independent Test**: Can be fully tested by executing the data download script and verifying the output contains all prime knots with crossing number ≤13 with consistent representation of crossing number and braid index fields. A validation against standard knot tables (KnotInfo, Hoste-Thistlethwaite-Weeks enumeration) confirms dataset completeness for the highest crossing number in scope.
38+
39+
**Acceptance Scenarios**:
40+
41+
1. **Given** the Knot Atlas is accessible, **When** the download script executes, **Then** the dataset contains all prime knots with crossing number ≤13 with crossing number, braid index, and alternating/non-alternating classification fields populated
42+
2. **Given** the dataset is downloaded, **When** a data quality check runs, **Then** at least 95% of records have both crossing number and braid index values present (no nulls in required invariant fields)
43+
44+
---
45+
46+
### User Story 2 - Compute Additional Invariants and Perform Exploratory Analysis (Priority: P2)
47+
48+
As a researcher, I need to compute additional invariants (arc index, Seifert circle count, bridge number) from available diagram representations and perform exploratory data analysis including scatter plots of crossing number vs. braid index stratified by alternating/non-alternating classification so that I can identify correlation patterns before fitting models.
49+
50+
**Why this priority**: Exploratory analysis informs model selection and reveals whether the hypothesized non-linear relationship exists. This step validates the research direction before committing to regression modeling.
51+
52+
**Independent Test**: Can be fully tested by generating scatter plots and summary statistics showing the crossing number vs. braid index relationship for alternating knots separately from non-alternating knots, with at least 3 additional invariants computed per knot.
53+
54+
**Acceptance Scenarios**:
55+
56+
1. **Given** the parsed dataset, **When** the invariant computation module runs, **Then** each knot record includes arc index, Seifert circle count, and bridge number values where computable from available diagram representations (minimal crossing diagrams, braid words, or Dowker-Thistlethwaite codes)
57+
2. **Given** the computed invariants, **When** exploratory plots are generated, **Then** scatter plots show crossing number vs. braid index with distinct stratification for alternating and non-alternating prime knots
58+
59+
---
60+
61+
### User Story 3 - Fit Regression Models and Validate Composite Complexity Score (Priority: P3)
62+
63+
As a researcher, I need to fit multiple regression models to test linear vs. non-linear relationships and construct a composite complexity score as a weighted combination of crossing number and braid index, then validate against held-out test set by testing correlation with arc index and Seifert circle count so that I can determine whether the composite measure shows predictive power.
64+
65+
**Why this priority**: This is the core analytical output that answers the research question. It builds on the data foundation and exploratory analysis to produce the predictive model and validation results.
66+
67+
**Independent Test**: Can be fully tested by executing the regression and validation pipeline on a held-out test set (e.g., 20% of knots) and producing correlation coefficients between the composite complexity score and arc index/Seifert circle count. Results are considered valid if correlation coefficients and effect sizes are reported with appropriate statistical context, regardless of whether thresholds are met.
68+
69+
**Acceptance Scenarios**:
70+
71+
1. **Given** the exploratory analysis results, **When** regression models are fitted, **Then** at least two model types (linear and non-linear) are compared with goodness-of-fit metrics (R², AIC/BIC) documented for each
72+
2. **Given** a composite complexity score is constructed, **When** validation is performed on held-out test set, **Then** Pearson and Spearman correlation with arc index and Seifert circle count is computed and reported with statistical significance testing (ANOVA for group differences where applicable), effect sizes (Cohen's d or r), and comparison against individual invariants to demonstrate composite performance
73+
3. **Given** alternating and non-alternating knot classifications, **When** ANOVA testing runs, **Then** group difference analysis is performed with p-values and effect sizes (Cohen's d) reported for the crossing number vs. braid index relationship between groups
74+
75+
---
76+
77+
### User Story 4 - Edge Case Handling, Data Quality, and Reproducibility Documentation (Priority: P4)
78+
79+
As a researcher, I need the system to handle edge cases (API unavailability, missing invariants, ambiguous classifications, crossing number ties) with documented fallback behaviors, AND produce complete reproducibility documentation for all code and data transformations, so that analysis can proceed robustly and results can be independently verified.
80+
81+
**Why this priority**: Edge case handling ensures reproducibility and robustness. Without explicit handling, silent failures or inconsistent behavior could invalidate downstream results. Reproducibility documentation is essential for scientific validation and community verification.
82+
83+
**Independent Test**: Can be fully tested by (1) simulating edge cases (API failures, missing data fields, ambiguous classifications) and verifying that the system produces appropriate flags, logs, and partial results rather than crashing or silently excluding data, AND (2) verifying that all reproducibility artifacts (checksums, logs, derivation notes, random seeds) are present and complete according to FR-009.
84+
85+
**Acceptance Scenarios**:
86+
87+
1. **Given** the Knot Atlas is unavailable, **When** retry logic executes, **Then** exponential backoff is applied and partial results are cached to disk after 3 consecutive failures
88+
2. **Given** a knot record has missing invariant data, **When** the computation module processes it, **Then** the record is flagged with missing_invariant_flags rather than being silently excluded
89+
3. **Given** a knot has ambiguous alternating/non-alternating classification, **When** stratified analysis runs, **Then** the record is either excluded (with count logged) or marked as "unclassifiable"
90+
4. **Given** crossing number ties exist, **When** invariant computations run, **Then** documented tie-breaking rules are applied consistently across all records
91+
5. **Given** all data transformations complete, **When** reproducibility check runs, **Then** all required artifacts (SHA-256 checksums, derivation notes, random seeds, timestamped logs) are present in docs/reproducibility/ directory
92+
93+
---
94+
95+
### Edge Cases
96+
97+
- What happens when Knot Atlas is unavailable or rate-limited during download? (System should implement retry logic with exponential backoff and cache partial results)
98+
- How does system handle knots where braid index or other invariants are not computable from available diagram representations? (Records should be flagged with missing_invariant_flags rather than silently excluded)
99+
- What happens when alternating vs. non-alternating classification is ambiguous or missing for a knot? (System should either exclude from stratified analysis or mark as unclassifiable)
100+
- How does system handle ties or near-ties in crossing number when determining "minimal" representations? (Document tie-breaking rules and ensure consistency across all invariant computations)
101+
102+
## Requirements *(mandatory)*
103+
104+
### Functional Requirements
105+
106+
- **FR-001**: System MUST download knot data from Knot Atlas (https://katlas.org/wiki/Main_Page) including crossing numbers, braid indices, and alternating/non-alternating classification for all prime knots with crossing number ≤13. Data format follows Knot Atlas JSON schema as documented at or CSV export with documented column mapping.

0 commit comments

Comments
 (0)