Skip to content

Commit 3ab13be

Browse files
Add HOW_IT_WORKS & TECH_REF; update README
Add high-level HOW_IT_WORKS.md and comprehensive TECHNICAL_REFERENCE.md documentation. Overhaul README to reorganize prerequisites, quick-start (Docker recommended), local vs Docker instructions, phase/command examples, and detailed configuration/CLI/ENV docs. Update defaults and examples (AI model -> gpt-5.2, email/Discord enabled, smtp/defaults, example tutorial URL), clarify output files and running individual pipeline phases.
1 parent b6eb4b4 commit 3ab13be

File tree

3 files changed

+1061
-158
lines changed

3 files changed

+1061
-158
lines changed

HOW_IT_WORKS.md

Lines changed: 158 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,158 @@
1+
# How TutorialValidator Works
2+
3+
This document explains what TutorialValidator does and how its pieces fit together, without going into the low-level implementation details. If you want the full technical picture, see [TECHNICAL_REFERENCE.md](TECHNICAL_REFERENCE.md).
4+
5+
---
6+
7+
## The Big Picture
8+
9+
TutorialValidator answers one question: **does this tutorial actually work?**
10+
11+
It does this by treating the tutorial as a test suite. It reads the tutorial, extracts every instruction as a structured step, and then has an AI agent carry out those steps — exactly as a real developer would — in a clean, isolated environment. If anything fails, the tutorial has a documentation bug.
12+
13+
The system was built to catch problems like:
14+
- Commands that reference files not yet created at that point in the tutorial
15+
- Code snippets with syntax errors or missing imports
16+
- Steps that depend on a previous step that was never documented
17+
- HTTP endpoints that are supposed to be reachable but aren't
18+
19+
---
20+
21+
## The Pipeline
22+
23+
Every run goes through three phases in sequence.
24+
25+
```
26+
Tutorial URL
27+
28+
29+
┌─────────────────────────────────────┐
30+
│ Phase 1 — Analyst │
31+
│ Scrape the tutorial pages │
32+
│ AI extracts structured steps │
33+
│ Output: testplan.json │
34+
└──────────────────┬──────────────────┘
35+
36+
37+
┌─────────────────────────────────────┐
38+
│ Phase 2 — Executor │
39+
│ AI agent follows each step │
40+
│ Runs commands, writes files, │
41+
│ makes HTTP calls, checks results │
42+
│ Output: results/ + summary.json │
43+
└──────────────────┬──────────────────┘
44+
45+
46+
┌─────────────────────────────────────┐
47+
│ Phase 3 — Reporter │
48+
│ Sends HTML email report │
49+
│ Posts Discord notification │
50+
└─────────────────────────────────────┘
51+
```
52+
53+
The **Orchestrator** is the coordinator that drives all three phases, manages the Docker environment, and produces the final `summary.json`.
54+
55+
---
56+
57+
## Phase 1: The Analyst
58+
59+
The Analyst's job is to read the tutorial and produce a machine-readable test plan.
60+
61+
**Scraping** — The Analyst fetches the tutorial URL and parses the HTML. It follows navigation links within the same tutorial series, collecting up to a configurable maximum number of pages. Each page is converted to clean Markdown.
62+
63+
**Analysis** — The Analyst sends the Markdown content to an AI model (OpenAI or Azure OpenAI) with a prompt that instructs it to extract every action a developer must take. The result is a list of structured steps in JSON format, called the **test plan** (`testplan.json`).
64+
65+
**Compaction** — Long tutorials can produce hundreds of raw steps. To keep execution time and AI cost reasonable, the Analyst merges adjacent steps of the same type (e.g., two consecutive file edits become one step with two modifications). This is controlled by the `--target-steps` and `--max-steps` arguments.
66+
67+
The test plan is the handoff between Phase 1 and Phase 2. You can inspect it, edit it, or even write one by hand and feed it directly to the Executor.
68+
69+
---
70+
71+
## Phase 2: The Executor
72+
73+
The Executor is where the actual testing happens. It loads the test plan and walks through every step, using an AI agent to perform each one.
74+
75+
**The AI agent** is powered by [Microsoft Semantic Kernel](https://learn.microsoft.com/en-us/semantic-kernel/overview/) and has access to four tools:
76+
77+
| Tool | What it can do |
78+
|---|---|
79+
| Command | Execute shell commands, start background processes (e.g., web servers) |
80+
| File Operations | Read, write, create, delete, and list files and directories |
81+
| HTTP | Make GET and POST requests, check if a URL is reachable |
82+
| Environment | Check installed tool versions, read environment variables, get directory info |
83+
84+
For each step, the agent receives a description of what needs to be done and uses these tools to carry it out. It then reports whether the step succeeded or failed.
85+
86+
**Fail-fast behavior** — If any step fails, the Executor stops. All remaining steps are marked as `Skipped`. This is intentional: a failed step usually means the environment is in an unexpected state, so continuing would produce misleading results.
87+
88+
**Long-running processes** — Some tutorial steps involve starting a web server or other background process. The Executor handles these specially: it starts the process in the background, watches its output for a readiness signal (like `Now listening on`), and keeps it running for later HTTP assertion steps.
89+
90+
---
91+
92+
## Step Types
93+
94+
The test plan uses four types of steps, each representing a different kind of instruction a tutorial can contain.
95+
96+
**Command** — A terminal command to run. Can include an expected exit code and file creation expectations. Long-running commands (like `dotnet run`) are flagged separately so the Executor knows not to wait for them to exit.
97+
98+
**File Operation** — Create, modify, or delete a file or directory. When creating a file, the step includes the full file content.
99+
100+
**Code Change** — Apply a specific code modification to an existing file. This can be a targeted find-and-replace (when you know the exact code to look for) or a full file replacement (when the tutorial shows a complete updated version of the file).
101+
102+
**Expectation** — Assert that something is true about the current state. There are three assertion types:
103+
- **Build assertion** — verifies that `dotnet build` succeeds
104+
- **HTTP assertion** — makes a request to a URL and checks the response (status code, body content)
105+
- **Database assertion** — queries the database and checks the result
106+
107+
---
108+
109+
## Developer Personas
110+
111+
The `--persona` flag changes how the AI agent behaves when it runs into problems. This lets you test your tutorial at different levels of strictness.
112+
113+
**Junior** — Follows instructions literally. Adds `using` statements when a type is unrecognized, but makes no other assumptions. If the tutorial is ambiguous or incomplete, it fails.
114+
115+
**Mid** (default) — Has solid knowledge of the tech stack (C#, .NET, etc.) but no framework-specific knowledge (e.g., no knowledge of ABP internals). Understands patterns well enough to match code correctly and handle edge cases in find-and-replace operations, but won't try to fix things the tutorial doesn't address.
116+
117+
**Senior** — Expert in the full stack including the framework. When something fails, it diagnoses the problem, attempts a fix, and retries up to 3 times. Every fix it makes is documented in the results — these represent potential improvements to the tutorial.
118+
119+
The persona system exists because different questions require different answers. To find documentation gaps, use `junior` or `mid`. To validate the overall flow and check what an expert can work around, use `senior`.
120+
121+
---
122+
123+
## Output Files
124+
125+
After a run, the output directory contains:
126+
127+
| File | What it contains |
128+
|---|---|
129+
| `scraped/` | The raw Markdown converted from the tutorial pages |
130+
| `testplan.json` | The structured test plan extracted from the tutorial |
131+
| `results/validation-result.json` | Per-step pass/fail status with timing and error details |
132+
| `results/validation-report.json` | Human-readable report with diagnostics and failure context |
133+
| `logs/` | Full console output from the Executor |
134+
| `summary.json` | Top-level summary: overall status, tutorial name, duration, paths to all files |
135+
136+
---
137+
138+
## Docker vs Local Mode
139+
140+
**Docker mode** (the default) runs the Executor inside a container. This means:
141+
- The tutorial's commands run in a clean, reproducible environment
142+
- Any mess created during execution stays inside the container
143+
- The container comes pre-installed with .NET 10, Node.js 20, the ABP CLI, and the EF Core CLI
144+
- A SQL Server instance runs as a sidecar container for tutorials that need a database
145+
146+
**Local mode** (`--local` flag) runs the Executor directly on your machine. This is faster to start because there's no container build step, but it means tutorial commands run against your local environment. Use this if you already have the required tools installed and want a quicker feedback loop, or if Docker is not available.
147+
148+
---
149+
150+
## Notifications
151+
152+
After execution, the Reporter can send notifications through two channels:
153+
154+
**Email** — An HTML report is formatted and sent via SMTP. The report includes the overall result, a table of all steps with their status, and detailed diagnostics for any failures.
155+
156+
**Discord** — A summary message is posted to a Discord webhook. Useful for team monitoring in CI/CD pipelines.
157+
158+
Both channels are configured in `appsettings.json` and can be disabled independently.

0 commit comments

Comments
 (0)