Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
c995606
MOTO v1.0.5: autonomy efficiency upgrade, title exploration, token tr…
Apr 8, 2026
20910d1
Fix CodeQL alerts: secure backend secret storage, safe path resolution
Apr 9, 2026
36259f9
Fix CodeQL alerts: secure backend secret storage, safe path resolution
Apr 9, 2026
bfec33f
Fix CodeQL alerts: secure backend secret storage, safe path resolution
Apr 9, 2026
1b6429a
v1.0.6: security hardening, empirical-provenance guardrails, boost UX…
Apr 19, 2026
22f06e4
Merge remote-tracking branch 'origin/main' into v1.0.5-development
Apr 19, 2026
d352090
MOTO v1.0.6
Apr 20, 2026
76f8622
# Version 1.0.7
Apr 30, 2026
2d0d964
# Version 1.0.7
Apr 30, 2026
0231700
# Version 1.0.7
May 1, 2026
f3e664d
# Version 1.0.7
May 2, 2026
2183450
# Version 1.0.7
May 2, 2026
bebc497
# Version 1.0.7
May 2, 2026
c30193b
# Version 1.0.7
May 2, 2026
76be08d
# Version 1.0.7
May 4, 2026
bf002ab
# Version 1.0.7
May 4, 2026
8710879
# Version 1.0.7
May 4, 2026
b6197b8
# Version 1.0.7
May 4, 2026
02b77be
# Version 1.0.7
May 4, 2026
f5a94b4
# Version 1.0.7
May 5, 2026
b67f4c0
# Version 1.0.7
May 5, 2026
76a442b
# Version 1.0.7
May 5, 2026
29cc9f3
# Version 1.0.7
May 5, 2026
50fb43e
# Version 1.0.7
May 5, 2026
eb77229
Resolve merge conflicts: keep launcher files
May 5, 2026
f2791b4
# Version 1.0.7
May 5, 2026
85907a5
# Version 1.0.7
May 5, 2026
be951f4
# Version 1.0.7
May 5, 2026
dec7b7d
# Version 1.0.7 Bug Fix
May 5, 2026
377c127
# Version 1.0.7 Bug Fix
May 5, 2026
501e8cb
# Version 1.0.7 Bug Fix
May 5, 2026
cb8eca8
# Version 1.0.7 Bug Fix
May 5, 2026
06298fc
# MOTO v1.0.8
May 17, 2026
b71d66c
MOTO v1.0.8
May 18, 2026
86136ab
MOTO v1.0.8
May 18, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 46 additions & 16 deletions .cursor/rules/api-key-controls.mdc

Large diffs are not rendered by default.

193 changes: 193 additions & 0 deletions .cursor/rules/hosted-web-contract.mdc

Large diffs are not rendered by default.

331 changes: 116 additions & 215 deletions .cursor/rules/json-prompt-design.mdc

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ alwaysApply: true

# Code and Rule Interaction Rules

1.) Never introduce a new wait to hault the program unless specifically directed by the user. The program is designed to run until its goal completion or the operator presses stop. Infinite loops are probabalistically avoided due to the feedback mechanics.
1.) Never introduce a new hidden wait/halt, automatic stop, or loop-disabling cap unless specifically directed by the user or already defined by these rules as an explicit safety valve/user-configurable checkpoint. RALPH/MOTO is designed to run until goal completion or until the operator presses stop; infinite loops are intentional and are probabilistically steered by feedback mechanics, not disabled by agent edits.

2.) Always remove and cleanup old code, do not comment out code or leave broken/unused code in this program unless specifically directed by the user.

Expand All @@ -18,6 +18,20 @@ alwaysApply: true

7.) Any REST shape, auth contract, WebSocket event, or `/api/features` capability change that affects the web wrapper must update **code, the relevant rule(s), and `api_contract_version` in `/api/features`** in the same approved merge. The live backend's `GET /openapi.json` is the machine-readable REST schema contract.

8.) Only ONE workflow mode may be active at a time (Aggregator, Compiler, or Autonomous Research). This constraint applies identically in both default mode and generic mode.
8.) Only ONE workflow mode may be active at a time (Aggregator, Compiler, Autonomous Research, or LeanOJ Proof Solver). This constraint applies identically in both default mode and generic mode. Start conflict checks must be serialized and include pending/background-task activity flags such as `autonomous_coordinator.is_active`, not only persisted `state.is_running` fields.

9.) Lean 4 and SMT features are always gated on `lean4_enabled`, `lean4_lsp_enabled`, and `smt_enabled` runtime flags. All three default false, must stay silent and side-effect-free when disabled, and must never ship Lean or Z3 toolchains or Python wheels into `requirements-generic.txt`, `Dockerfile`, or `docker/entrypoint.sh` (hosted image stays Lean-free and Z3-free). Lean 4 is authoritative for every stored proof; SMT contributes hints only.
9.) Lean 4 and SMT features are always gated on `lean4_enabled`, `lean4_lsp_enabled`, and `smt_enabled` runtime flags. All three default false, must stay silent and side-effect-free when disabled, and must never ship Lean or Z3 toolchains or Python wheels into `requirements-generic.txt`, `Dockerfile`, or `docker/entrypoint.sh` (hosted image stays Lean-free and Z3-free). Lean 4 is authoritative formal checking for every stored proof and is necessary for LeanOJ final solutions; SMT contributes hints only. Z3 executable paths are trusted startup/operator configuration only, must be rejected as runtime API input, and must resolve to a `z3`/`z3.exe` executable. Automated proof candidates must directly serve the user prompt, not merely be non-trivial or novel.

10.) LeanOJ initial topic generation and brainstorm submitters always run in parallel and feed one validator that batch-validates up to 3 topics/submissions. Initial topic candidates/selection must be broad locked foundation questions covering the whole LeanOJ solution route, not narrow sublemma/tactic/local-repair topics. Recursive brainstorming has no separate recursive-topic prepass and must not re-inject the initial selected topic as active steering context; it uses the shared accepted proof-memory database plus the current proof/failure context. Accepted brainstorm memory must preserve occurrence-specific chronological metadata even for duplicate idea text. Never implement active LeanOJ topic or brainstorm phases as round-robin/serial submitter calls; one hung submitter must not halt the phase.

11.) LeanOJ stop/crash/restart is resumable by default. `Clear Progress` / `/api/leanoj/clear?confirm=true` is the only intentional reset path. Start/restart should choose the best matching/resumable persisted session by proof context, not blindly create a new session or pick the latest file.

12.) LeanOJ OpenRouter credit exhaustion or no-fallback provider configuration errors are non-retryable pauses, not proof-attempt failures. Do not let API credit/config failures inflate final proof attempt loops.

13.) LeanOJ/RALPH final-proof loop checkpoints may only be user-configurable feedback checkpoints, not hidden loop shutdowns. The durable `master_proof.lean` is the authoritative working draft, and every accepted master-proof edit must pass an in-memory Lean gate before persistence: `needs_more_time=true` runs Lean with `sorry`/`admit` placeholders allowed but still requires parse/typecheck, template preservation, and no fake proof devices; `needs_more_time=false` runs Lean placeholder-free and then final semantic review against the user prompt/template before the run stops as verified. Final-proof mode is edit-only: it must not be offered, shown, or taught `stuck_needs_brainstorm`, raw `need_more_brainstorming`, failed-attempt counts, or any path transition. It may see the most recent 5 final attempts as compact execution feedback (Lean errors, stale edit rejections, JSON truncation, watchdog/no-progress notices) so it can avoid repeating failed edits. Lean/template rejection, semantic-review rejection, conservative no-progress/stale-edit watchdog feedback, and validator rejection of non-progressive shortening edits must preserve the master proof and persist structured continuation feedback; non-user-forced no-progress handoffs should gather recursive brainstorm context before re-entering final mode.

14.) LeanOJ/RALPH final verification must remain placeholder-free, but Lean-accepted scaffolds containing `sorry`/`admit` should be saved as partial proofs for future context. Partial proofs are citeable incomplete references only; never count them as verified solutions and never accept fake `axiom`/`constant`/`opaque` proof devices.

15.) Parent/user-selected phases have hierarchy precedence over child branches. When a parent phase starts (LeanOJ forced final loop, autonomous paper writing, Tier 3 final answer/final selection), lower-tier brainstorm/topic/path child tasks must stop or be ignored. LeanOJ `Skip Brainstorm` locks the run into the final loop until the configured final-attempt cycle is exhausted; model/path requests for more brainstorming cannot override that user action early. `Force Brainstorm` is a separate explicit user override that returns to recursive brainstorming while preserving proof progress.

16.) LeanOJ prompt flows must guard formal/informal mismatches: treat the Lean template as the formal source of truth, do not silently reinterpret operations such as `Nat` subtraction, sanity-check exact-template formulas on small cases when feasible, and never claim Lean acceptance alone proves the informal problem unless that correspondence is justified.
11 changes: 8 additions & 3 deletions .cursor/rules/part-1-aggregator-tool-design-specifications.mdc
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ Validator processes 1, 2, or 3 submissions simultaneously using batch-specific p

**Submission context injection**: Direct inject if fits. If too large: RAG the submission as file, keep user prompt direct. If user prompt + RAG'd submission still too large: RAG all user-prompt files. If user prompt itself too large after all RAG: halt with error + diagnostic.

**Hosted upload enforcement (generic mode)**: Server-side validation of `.txt` only, 5 MB max, filename sanitization, path traversal rejection. Applied in both modes but critical for hosted sandboxes where the control plane proxies uploads.
**Upload/path enforcement**: Server-side validation of `.txt` only, 5 MB max, filename sanitization, path traversal rejection. Upload responses return logical filenames, not absolute host paths. Public Aggregator starts resolve `uploaded_files` only under `user_uploads`; internal autonomous reference-paper context may opt into trusted data-root file references via an explicit coordinator flag.

## Context Allocation

Expand All @@ -66,16 +66,21 @@ No context carryover between prompts (only system-intended DB/submission transfe

User selects model per role. Multiple roles can share a model. Models load with user-set context sizes.

Per-role Supercharge is optional. When enabled for a submitter or validator, `api_client_manager.generate_completion()` runs 4 parallel full answer attempts for that role call, then a 5th same-model synthesis call and returns only the synthesis result. Supercharge candidate attempts intentionally use temperatures `[0.0, 0.2, 0.4, 0.8]` to diversify parallel outputs; synthesis remains `0.0`. Candidate attempts are sanitized to reusable visible answer text before synthesis; private thought/channel/control transcript text must never be fed back as feedback, brainstorm memory, or synthesis context. The synthesis prompt frames candidates as optional working material: the model may use one, combine several, ignore all, or write a stronger new answer, while preserving the original role output contract. If Boost applies to that role/task, all internal Supercharge calls use the Boost config first. Tool-call requests bypass Supercharge.

Parallel brainstorm submitter lanes intentionally use temperatures `[0.0, 0.1, ..., 0.9]` by submitter index so every parallel set includes a deterministic lane and increasing exploration lanes. This applies only to parallel submitter generation. Validators, compiler roles, JSON retries, and single-model sequential submitters remain `0.0`.

## Single-Model Mode
When ALL submitters AND validator use the same model → single-model mode:
- Submitters run SEQUENTIALLY (S1 → S2 → ... → Sn)
- Validator processes all queued submissions after each full submitter round
- Prevents queue overflow from parallel tasks flooding when LLM completes
- Exception: if LM Studio reports multiple loaded same-base numeric `:#` instances for that model, submitters may still run in parallel while the LM Studio client routes independent calls to idle sibling instances.
- Boost does NOT affect single-model detection (routing only, not model config)

## Multi-Submitter Configuration

Per-submitter: provider (LM Studio / OpenRouter in default mode; OpenRouter only in generic mode), model, OpenRouter host provider, LM Studio fallback (default mode only), context window, max output tokens. UI: "Number of Submitters" selector (1-10), "Copy Main to All" button.
Per-submitter: provider (LM Studio / OpenRouter in default mode; OpenRouter only in generic mode), model, OpenRouter host provider, LM Studio fallback (default mode only), context window, max output tokens, and Supercharge checkbox. UI: "Number of Submitters" selector (1-10), "Copy Main to All" button.

OpenRouter auto-fill rule: selecting an OpenRouter model auto-fills from endpoint metadata only. Context window uses the smallest relevant host `context_length`; max output tokens use `min(20% of that host context, smallest relevant host max_completion_tokens)`. If `max_prompt_tokens` is available, shrink usable context to respect it. If endpoint caps are incomplete, preserve current values (no guessing).

Expand All @@ -89,7 +94,7 @@ Accepted submissions database: never truncated. Live preview shows exact non-tru

Every 7th acceptance (`total_acceptances % 7 == 0`, minimum 7 before first review):

**Phase 1**: Validator reviews ALL accepted submissions, identifies AT MOST ONE for removal (redundant, contradicted, superseded, or provides no unique value).
**Phase 1**: Validator reviews the accepted-submissions database and identifies AT MOST ONE for removal (redundant, contradicted, superseded, or provides no unique value). If the complete database fits, it is direct-injected in full. If it does not fit, cleanup must use the normal direct-first/RAG fallback path instead of skipping or truncating; the review is then evidence-bounded by retrieved context.

**Phase 2** (only if removal proposed): Validator self-validates its removal proposal. Conservative default: if uncertain, reject removal. If validated: execute removal + full RAG rebuild (all shared-training sources are dropped and re-indexed from the post-removal file so deleted content is no longer retrievable).

Expand Down
23 changes: 18 additions & 5 deletions .cursor/rules/part-1-and-part-2-cointeraction-architecture.mdc
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,12 @@ alwaysApply: true

This describes additional architecture for the synergy between the part 1 database aggregator tool and part 2 aggregator-compiler tool. Both modes operate identically in default and generic deployment — the only difference is provider availability (see `hosted-web-contract.mdc` for details on generic mode).

NOTE: This is a continuously-running program that does not stop itself, the user selects the aggregator to start, then starts the compiler when they desire, and then the user choses when to turn off each selective mode by turning the off switch. There is no "solution stop token" as in normal AI solution generation.
NOTE: This is a continuously-running program that does not stop itself; the user selects one top-level workflow mode to run and turns it off when desired. There is no "solution stop token" as in normal AI solution generation.


## Aggregator start-up workflow
1.) The aggregator runs initially with no compiler running.
2.) The compiler does not begin running until the user starts it manually. Aggregator can run on its own for a head-start for as long as the operator would like. If the operator desires the aggregator can also run by itself without any compilation.
2.) The compiler does not begin running until the user starts it manually, and current code enforces that Aggregator, Compiler, Autonomous Research, and LeanOJ Proof Solver are mutually exclusive top-level modes. Aggregator can run on its own without any compilation.

## GUI Design

Expand Down Expand Up @@ -40,22 +40,34 @@ The live-constructing compiler-written paper should be viewable in one tab and a

**Use Case**: User may have domain knowledge that the brainstorm has explored sufficient territory before the automatic 10-acceptance interval, saving time and allowing manual control over the autonomous workflow.

## Multi-Submitter Architecture (Aggregator Only)
## Hierarchy / Parent Action Precedence

**Distinction**: Multiple submitters are only available for the Aggregator (Part 1 and Part 3 brainstorm aggregation). The Compiler (Part 2) uses a fixed single-submitter sequential Markov chain workflow.
Parent workflow actions override child agents immediately. Manual paper writing, forced Tier 3, LeanOJ forced final solving, and final selection phases must stop or fence off any lower-tier brainstorm/topic/path workers before continuing. Never allow stale child outputs to change phase after a parent action has taken ownership.

## Multi-Submitter Architecture (Aggregator and LeanOJ)

**Distinction**: Multiple submitters are available for the Aggregator (Part 1 and Part 3 brainstorm aggregation) and LeanOJ topic/brainstorm phases. The Compiler (Part 2) uses a fixed single-submitter sequential Markov chain workflow.

### Aggregator Multi-Submitter (Part 1 & Part 3)
- Configurable 1-10 parallel submitters (default: 3)
- Each submitter can have its own model, context window, and max output tokens
- Enables multi-model exploration of different solution basins simultaneously
- Parallel submitter generation uses the shared temperature ladder `[0.0, 0.1, ..., 0.9]` by submitter index; single-model sequential submitters and validators stay `0.0`.
- If all submitters and the validator are configured with the same LM Studio model ID, the Aggregator normally uses single-model sequential mode. Exception: when LM Studio reports multiple loaded same-base numeric `:#` instances for that model, submitters may run in parallel and `lm_studio_client` routes independent calls to idle sibling instances while the validator remains ordered.
- Single validator maintains coherent Markov chain evolution for database alignment
- UI labels: "Submitter 1 (Main Submitter)", "Submitter 2", "Submitter 3", etc.
- "Copy Main to All" button for quick configuration

### LeanOJ Topic/Brainstorm Multi-Submitter
- Configurable 1-10 parallel submitters generate initial topics and brainstorm ideas
- One validator batch-validates up to 3 completed topics or submissions at a time; initial topics must be broad locked foundation questions for the whole LeanOJ solution route, not narrow lemma/tactic/repair targets
- Parallel topic/brainstorm submitters use the shared temperature ladder `[0.0, 0.1, ..., 0.9]` by submitter index; LeanOJ validators, final solver, semantic review, and retry/repair calls stay `0.0`.
- No round-robin/serial submitter awaiting; a hung submitter must not block other submitters or validation

### Compiler Single-Submitter (Part 2)
- Fixed 2-submitter architecture (NOT configurable):
- **High-Context Submitter**: Handles outline_create, outline_update, construction, review modes. During construction, may invoke the Wolfram Alpha tool up to 20 times per submission when `system_config.wolfram_alpha_enabled=true`.
- **High-Parameter Submitter**: Handles rigor mode. Rigor is the **Lean-4-verified-theorem flow**: discovery → up to 5 Lean 4 formalization attempts (with error feedback) → novelty classification → placement (2 attempts, validator uses `rigor_lean_placement` mode forcing `rigor_check=True`) Theorems Appendix fallback. The compiler writes verified proofs directly into the shared `proof_database` (same database used by autonomous mode); novel proofs automatically enter the highest-priority direct-injection block on the next submitter instantiation.
- **High-Parameter Submitter**: Handles rigor mode. Rigor is the **Lean-4-verified-theorem flow**: user-prompt-relevant discovery (including explicit extension theorems from partial paper work / outline / supporting context / user prompt when helpful) → up to 5 Lean 4 formalization attempts (with error feedback) → novelty classification → placement routing. Existing-paper-claim theorems may go through inline placement (2 attempts, validator uses `rigor_lean_placement` mode forcing `rigor_check=True`); extension-derived theorems are forced to `placement_preference="appendix_only"` and appended directly to the Theorems Appendix (`placement_outcome="appendix_requested"`). Inline failures still use Theorems Appendix fallback. The compiler writes verified proofs directly into the shared `proof_database` (same database used by autonomous mode); novel proofs automatically enter the highest-priority direct-injection block on the next submitter instantiation.
- Sequential Markov chain workflow (only one submission at a time)
- Each compiler submitter has its own model, context, and max token settings (separate from aggregator)
- UI shows these as separate "High-Context Submitter" and "High-Parameter Submitter" sections
Expand All @@ -65,6 +77,7 @@ The live-constructing compiler-written paper should be viewable in one tab and a
## Additional Traits Shared Between Aggregator-Submitters and Compiler-Submitters

- The JSON of aggregator-subbmiters and compiler-submitters should include a "reasoning:" request below its "submission:" line. (This forces the submitter to explain the thoughts behind there reasoning and can also reveal deception for additional context for the validator.)
- MOTO conversational retries must preserve useful failed-output context, but never raw provider/model transcript text. Any assistant replay or reusable feedback/memory excerpt must be sanitized to remove known private thought/channel/control transport scaffolding first while preserving visible mathematical/Lean syntax such as `<|` and literal marker text inside JSON/string content. Parser error strings sent back to models must not include raw output excerpts. Exact tool-call assistant/tool protocol turns are the only exception.

## API Call Output Notes (User-Configurable)
- **All `max_tokens` limits are user-configurable via GUI settings** (like context window sizes). Users can adjust these per model role based on their specific models' capabilities.
Expand Down
Loading
Loading