Skip to content

Commit 069730f

Browse files
Merge pull request #4 from StewAlexander-com/integrate-tutor-codelab
Inline code lab: lesson + run + evaluate
2 parents 70b25a7 + 291f5df commit 069730f

15 files changed

Lines changed: 1605 additions & 12 deletions

README.md

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -164,6 +164,48 @@ The chat module reads (in order): `window.TUTOR_BACKEND_URL` → `<meta
164164
name="tutor-backend">``localStorage["tutor-backend"]` → port heuristic →
165165
same origin.
166166

167+
## UX Workflow — read · run · evaluate
168+
169+
Every section view now ends with an inline **code lab**: an editor seeded
170+
with the section's example snippet, a **Run** button that executes the code
171+
locally, and an **Evaluate** button that sends the code + the actual run
172+
output to the tutor for evidence-based feedback. The floating chat panel
173+
stays available for free-form questions.
174+
175+
```mermaid
176+
flowchart LR
177+
Lesson["Lesson view"] --> Lab["Code lab editor"]
178+
Lab -- "Run" --> RunApi["/api/run<br/>subprocess + timeout"]
179+
RunApi --> Lab
180+
Lab -- "Evaluate" --> EvalApi["/api/evaluate<br/>evidence packet → LLM"]
181+
EvalApi --> Lab
182+
Lab -. "free-form" .-> Chat["Floating chat → /api/chat"]
183+
```
184+
185+
The five candidate workflows considered, the trade-offs, and the chosen
186+
blend (lesson-first spine + inline code-lab + evidence-packet evaluation)
187+
are written up in [`docs/ux-workflow.md`](docs/ux-workflow.md).
188+
189+
### New backend endpoints
190+
191+
- `POST /api/run` — runs the submitted code in an isolated Python subprocess
192+
(`python -I`, empty env, temp cwd, hard wall-clock timeout, size-limited
193+
output). Returns `{stdout, stderr, exit_code, duration_ms, timed_out,
194+
truncated}`. This is **prototype safety only** — subprocess + timeout +
195+
restricted env. Not a real sandbox. See
196+
[`docs/safety-and-sandboxing.md`](docs/safety-and-sandboxing.md) for the
197+
controls a serious deployment would add (containers, seccomp, network
198+
namespaces, CPU/memory limits).
199+
- `POST /api/evaluate` — accepts `{code, section?, question?, run_output?}`,
200+
runs the code if `run_output` is missing, builds an evidence packet, and
201+
asks the LLM for a hint-first assessment. Returns
202+
`{assessment, feedback, next_step, run, model}` where `assessment` is one
203+
of `passed | needs_work | error`.
204+
205+
Configurable via env: `TUTOR_RUN_TIMEOUT` (default 5s, clamped 0.5–30s),
206+
`TUTOR_RUN_MAX_CODE_BYTES` (default 50 000), `TUTOR_RUN_MAX_OUTPUT_BYTES`
207+
(default 32 000).
208+
167209
## Core Components
168210

169211
- **Tutor UI**: A local web app, terminal interface, or desktop shell where the student reads lessons, submits code, and receives feedback.

backend/README.md

Lines changed: 92 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,12 @@ A small, local-first FastAPI service that proxies an [Ollama](https://ollama.com
44
LLM (default: `gemma3:4b`) and exposes a tutor-shaped HTTP API for the
55
[`frontend/`](../frontend/) PWA and other clients.
66

7-
The backend is intentionally minimal. It does not yet execute student code; the
8-
sandboxed runner described in [`docs/safety-and-sandboxing.md`](../docs/safety-and-sandboxing.md)
9-
is a separate milestone.
7+
The backend now also exposes a *prototype-grade* Python runner
8+
(`POST /api/run`) and an LLM evaluator (`POST /api/evaluate`) used by the
9+
frontend's inline code lab. The runner uses subprocess isolation with a
10+
hard wall-clock timeout and a restricted env — see
11+
[`docs/safety-and-sandboxing.md`](../docs/safety-and-sandboxing.md) for the
12+
controls a real deployment would still need to add.
1013

1114
## Layout
1215

@@ -16,6 +19,7 @@ backend/
1619
│ ├── config.py # env-driven Settings + tutor system prompt loader
1720
│ ├── main.py # FastAPI app factory and routes
1821
│ ├── ollama_client.py # async client for /api/tags and /api/chat
22+
│ ├── runner.py # prototype Python subprocess runner (timeout + restricted env)
1923
│ └── schemas.py # pydantic request/response models
2024
├── tests/
2125
│ └── test_api.py # mocked Ollama tests via respx
@@ -75,6 +79,9 @@ before launching uvicorn — the static frontend will be mounted at `/`.
7579
| `TUTOR_SERVE_FRONTEND` | `0` | Set to `1` to mount `frontend/` at `/`. |
7680
| `TUTOR_FRONTEND_DIR` | `../frontend` | Override the directory used when serving the frontend. |
7781
| `TUTOR_SYSTEM_PROMPT_PATH` | `../prompts/tutor-system-prompt.md` | Markdown file whose first fenced block is used as the default system prompt. |
82+
| `TUTOR_RUN_TIMEOUT` | `5` | Wall-clock seconds for `/api/run` and `/api/evaluate` code execution. Clamped to 0.5–30s. |
83+
| `TUTOR_RUN_MAX_CODE_BYTES` | `50000` | Max UTF-8 bytes accepted for a single submission. Clamped to 1 000–200 000. |
84+
| `TUTOR_RUN_MAX_OUTPUT_BYTES` | `32000` | Each of stdout/stderr is truncated past this. Clamped to 1 000–200 000. |
7885

7986
## Endpoints
8087

@@ -145,6 +152,81 @@ curl -N http://localhost:8001/api/chat \
145152

146153
Each streamed line is a JSON object forwarded from Ollama's `/api/chat` stream.
147154

155+
### `POST /api/run`
156+
157+
Executes student code in an isolated Python subprocess. **Prototype safety
158+
only** — subprocess + hard timeout + restricted env (`python -I`, empty env
159+
except `LC_ALL`/`PYTHONIOENCODING`, temp cwd). This is *not* a real sandbox.
160+
161+
Request:
162+
163+
```jsonc
164+
{
165+
"code": "print(2 + 2)\n",
166+
"stdin": "", // optional
167+
"timeout": 3.0 // optional, default 5s, clamped 0.5–30s
168+
}
169+
```
170+
171+
Response:
172+
173+
```json
174+
{
175+
"stdout": "4\n",
176+
"stderr": "",
177+
"exit_code": 0,
178+
"duration_ms": 16,
179+
"timed_out": false,
180+
"truncated": false
181+
}
182+
```
183+
184+
Errors:
185+
186+
- `400` if `code` exceeds `TUTOR_RUN_MAX_CODE_BYTES`.
187+
- `422` for malformed bodies.
188+
- Student-side failures (syntax errors, non-zero exits, timeouts) are
189+
**not** errors — they come back in the normal response with
190+
`exit_code != 0` and/or `timed_out: true`.
191+
192+
### `POST /api/evaluate`
193+
194+
Wraps a `/api/run` + LLM call into one request. Builds a compact evidence
195+
packet (code + actual runtime output + optional section context and
196+
learner question) and asks the tutor model for a hint-first assessment.
197+
198+
Request:
199+
200+
```jsonc
201+
{
202+
"code": "for n in [1,2,3]: print(n)\n",
203+
"section": "10 — Loops", // optional
204+
"question": "Is this idiomatic?", // optional
205+
"run_output": { // optional — if present, /api/run is skipped
206+
"stdout": "...", "stderr": "", "exit_code": 0,
207+
"duration_ms": 5, "timed_out": false, "truncated": false
208+
},
209+
"model": "gemma3:4b", // optional
210+
"temperature": 0.2 // optional
211+
}
212+
```
213+
214+
Response:
215+
216+
```json
217+
{
218+
"assessment": "passed",
219+
"feedback": "Your loop iterates correctly and prints each item...",
220+
"next_step": "Try the same with a list comprehension.",
221+
"run": { "stdout": "1\n2\n3\n", "stderr": "", "exit_code": 0, "duration_ms": 14, "timed_out": false, "truncated": false },
222+
"model": "gemma3:4b"
223+
}
224+
```
225+
226+
`assessment` is one of `passed | needs_work | error`. `next_step` is a
227+
best-effort extraction from the model's reply; it may be `null` if the
228+
tutor's response did not include a recognisable next-step line.
229+
148230
## Tests
149231

150232
```bash
@@ -154,13 +236,14 @@ cd backend
154236

155237
Tests use `respx` to mock the Ollama HTTP API, so they run without a real model
156238
server. The suite covers health (reachable + degraded), config, default and
157-
custom system prompt injection, and upstream error handling.
239+
custom system prompt injection, upstream error handling, the frontend chat
240+
wiring, and the `/api/run` + `/api/evaluate` endpoints (including the runner
241+
module's timeout, isolation, and output-truncation behaviour).
158242

159243
## Roadmap
160244

161-
- Add a `/api/run` endpoint that wraps the sandboxed Python runner described in
162-
[`docs/safety-and-sandboxing.md`](../docs/safety-and-sandboxing.md).
163-
- Add a `/api/tutor/turn` endpoint that orchestrates: run code → collect
164-
evidence → call LLM with the structured context template from
165-
[`prompts/tutor-system-prompt.md`](../prompts/tutor-system-prompt.md).
245+
- Tighten `/api/run` isolation: container or microVM, network namespace,
246+
CPU/memory limits, seccomp/AppArmor where available.
247+
- Stream `/api/evaluate` responses (the LLM call already streams; the
248+
evidence-packet shape just needs an NDJSON variant).
166249
- Persist learner state (see roadmap M4).

backend/app/main.py

Lines changed: 146 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,12 +12,22 @@
1212

1313
from .config import Settings, get_settings
1414
from .ollama_client import OllamaClient, OllamaError
15+
from .runner import (
16+
DEFAULT_TIMEOUT_SEC,
17+
MAX_CODE_BYTES,
18+
RunnerError,
19+
run_python,
20+
)
1521
from .schemas import (
1622
ChatMessage,
1723
ChatRequest,
1824
ChatResponse,
1925
ConfigResponse,
26+
EvaluateRequest,
27+
EvaluateResponse,
2028
HealthResponse,
29+
RunRequest,
30+
RunResponse,
2131
)
2232

2333

@@ -76,6 +86,142 @@ async def config() -> ConfigResponse:
7686
ollama_url=settings.ollama_url,
7787
default_model=settings.model,
7888
request_timeout=settings.request_timeout,
89+
run_timeout_default=DEFAULT_TIMEOUT_SEC,
90+
run_max_code_bytes=MAX_CODE_BYTES,
91+
)
92+
93+
def _result_to_response(result) -> RunResponse:
94+
return RunResponse(
95+
stdout=result.stdout,
96+
stderr=result.stderr,
97+
exit_code=result.exit_code,
98+
duration_ms=result.duration_ms,
99+
timed_out=result.timed_out,
100+
truncated=result.truncated,
101+
)
102+
103+
@app.post("/api/run", response_model=RunResponse)
104+
async def run(req: RunRequest) -> RunResponse:
105+
try:
106+
result = await run_python(
107+
req.code, stdin=req.stdin, timeout=req.timeout
108+
)
109+
except RunnerError as exc:
110+
raise HTTPException(status_code=400, detail=str(exc)) from exc
111+
return _result_to_response(result)
112+
113+
def _build_evaluation_prompt(
114+
code: str,
115+
run_resp: RunResponse,
116+
section: str | None,
117+
question: str | None,
118+
) -> str:
119+
# Build a compact, factual evidence packet. The LLM is told to act
120+
# on these facts and not to invent runtime behaviour.
121+
lines: list[str] = []
122+
lines.append(
123+
"You are reviewing a student's Python attempt. Use only the runtime"
124+
" evidence below — do not claim outputs or behaviour you can't see."
125+
" Reply in three short parts:"
126+
)
127+
lines.append(" 1. Assessment — one line: passed | needs_work | error.")
128+
lines.append(
129+
" 2. Feedback — 2-4 sentences, hint-first. If the code errored,"
130+
" explain the error in beginner terms. If it ran cleanly, judge"
131+
" whether the approach is right; otherwise give a hint, not a fix."
132+
)
133+
lines.append(
134+
" 3. Next step — one short concrete suggestion (a small change to"
135+
" try, or a follow-up exercise)."
136+
)
137+
lines.append("")
138+
if section:
139+
lines.append(f'Section context: "{section}".')
140+
if question:
141+
lines.append(f"Student question: {question}")
142+
lines.append("")
143+
lines.append("Student code:")
144+
lines.append("```python")
145+
lines.append(code)
146+
lines.append("```")
147+
lines.append("")
148+
lines.append(f"Exit code: {run_resp.exit_code}")
149+
lines.append(f"Duration: {run_resp.duration_ms} ms")
150+
if run_resp.timed_out:
151+
lines.append("NOTE: execution hit the runner's timeout.")
152+
lines.append("Stdout:")
153+
lines.append("```")
154+
lines.append(run_resp.stdout or "(empty)")
155+
lines.append("```")
156+
lines.append("Stderr:")
157+
lines.append("```")
158+
lines.append(run_resp.stderr or "(empty)")
159+
lines.append("```")
160+
return "\n".join(lines)
161+
162+
def _classify_assessment(text: str, run_resp: RunResponse) -> str:
163+
"""Best-effort parse of the model's first line; fall back to evidence."""
164+
first = (text or "").strip().splitlines()[0].lower() if text else ""
165+
for label in ("passed", "needs_work", "needs work", "error"):
166+
if label in first:
167+
return "needs_work" if label == "needs work" else label
168+
if run_resp.timed_out or run_resp.exit_code != 0:
169+
return "error" if run_resp.stderr else "needs_work"
170+
return "needs_work"
171+
172+
def _extract_next_step(text: str) -> str | None:
173+
if not text:
174+
return None
175+
for line in text.splitlines():
176+
stripped = line.strip().lstrip("-*0123456789. ").strip()
177+
low = stripped.lower()
178+
if low.startswith("next step"):
179+
# "Next step: ..." or "Next step — ..."
180+
for sep in (":", "—", "-"):
181+
if sep in stripped:
182+
return stripped.split(sep, 1)[1].strip() or None
183+
return stripped
184+
return None
185+
186+
@app.post("/api/evaluate", response_model=EvaluateResponse)
187+
async def evaluate(req: EvaluateRequest) -> EvaluateResponse:
188+
if req.run_output is not None:
189+
run_resp = req.run_output
190+
else:
191+
try:
192+
result = await run_python(
193+
req.code, stdin=req.stdin, timeout=None
194+
)
195+
except RunnerError as exc:
196+
raise HTTPException(status_code=400, detail=str(exc)) from exc
197+
run_resp = _result_to_response(result)
198+
199+
prompt = _build_evaluation_prompt(
200+
req.code, run_resp, req.section, req.question
201+
)
202+
model = req.model or settings.model
203+
messages = [
204+
ChatMessage(role="system", content=settings.system_prompt),
205+
ChatMessage(role="user", content=prompt),
206+
]
207+
client = make_client()
208+
try:
209+
raw = await client.chat(
210+
model=model,
211+
messages=messages,
212+
temperature=req.temperature,
213+
)
214+
except OllamaError as exc:
215+
raise HTTPException(status_code=502, detail=str(exc)) from exc
216+
217+
msg = raw.get("message") or {}
218+
feedback = msg.get("content", "") or ""
219+
return EvaluateResponse(
220+
assessment=_classify_assessment(feedback, run_resp),
221+
feedback=feedback,
222+
next_step=_extract_next_step(feedback),
223+
run=run_resp,
224+
model=raw.get("model", model),
79225
)
80226

81227
@app.post("/api/chat", response_model=ChatResponse)

0 commit comments

Comments
 (0)