Apply suggestions from code review

yiouli · Copilot · web-flow · commit 8018c4b0cf11 · 2026-04-05T12:31:27.000-07:00
Co-authored-by: Copilot &lt;175728472+Copilot@users.noreply.github.com&gt;
diff --git a/skills/eval-driven-dev/SKILL.md b/skills/eval-driven-dev/SKILL.md
@@ -167,7 +167,7 @@ Run `pixie test` (without a path argument) to execute the full evaluation pipeli
 
 ## Web Server Management
 
-pixie-qa runs a web server in the background for displaying context, traces, and eval results to the user. It's automatically started by the setup script, and need to be explicitly cleaned up when display is no longer needed.
+pixie-qa runs a web server in the background for displaying context, traces, and eval results to the user. It's automatically started by the setup script, and needs to be explicitly cleaned up when display is no longer needed.
 
 When the user is done with the eval-driven-dev workflow, inform them the web server is still running and you can clean it up with the following command:
 
diff --git a/skills/eval-driven-dev/references/2-instrument-and-observe.md b/skills/eval-driven-dev/references/2-instrument-and-observe.md
@@ -1,6 +1,6 @@
 # Step 2: Instrument and observe a real run
 
-> For a quick lookup of imports, CLI commands, and key concepts, see `quick-reference.md`.
+> For a quick lookup of imports, CLI commands, and key concepts, see `instrumentation-api.md`.
 
 **Why this step**: You need to see the actual data flowing through the app before you can build anything. This step produces a reference trace that shows the exact data shapes you'll use for datasets and evaluators.
 
diff --git a/skills/eval-driven-dev/references/3-run-harness.md b/skills/eval-driven-dev/references/3-run-harness.md
@@ -21,9 +21,9 @@ run_app(eval_input) → eval_output
 
    **Starting web servers**: If you need to start a server process (for the subprocess approach), always use `run-with-timeout.sh` to start it in the background — never use bare `&` or `nohup` directly. See the FastAPI example file for the pattern.
 
-   **TestClient + database gotcha**: If the app manages DB connections in its FastAPI lifespan (common pattern: `_conn = get_connection()` in startup, `_conn.close()` in shutdown), the TestClient's lifespan teardown will close your mock connection. Read the "Gotcha: FastAPI TestClient + Database Connections" section below for the fix (wrap the connection to prevent lifespan from closing it).
+   **TestClient + database gotcha**: If the app manages DB connections in its FastAPI lifespan (common pattern: `_conn = get_connection()` in startup, `_conn.close()` in shutdown), the TestClient's lifespan teardown will close your mock connection. Read the "Gotcha: FastAPI TestClient + Database Connections" section in `references/run-harness-examples/fastapi-web-server.md` for the fix (wrap the connection to prevent lifespan from closing it).
 
-   **Concurrency — critical**: `assert_dataset_pass` calls `run_app` concurrently for multiple dataset items. Your harness **must be concurrency-safe**. Do NOT wrap the entire function in a `threading.Lock()` — this serializes all runs and makes tests extremely slow. Instead, initialize the app (TestClient, DB, services) **once at module level** and let each `run_app` call reuse the shared client. The app's per-session state (keyed by call_sid, session_id, etc.) provides natural isolation. Read the "Concurrency-safe harness" section below for the pattern.
+   **Concurrency — critical**: `assert_dataset_pass` calls `run_app` concurrently for multiple dataset items. Your harness **must be concurrency-safe**. Do NOT wrap the entire function in a `threading.Lock()` — this serializes all runs and makes tests extremely slow. Instead, initialize the app (TestClient, DB, services) **once at module level** and let each `run_app` call reuse the shared client. The app's per-session state (keyed by call_sid, session_id, etc.) provides natural isolation. Read the "Concurrency-safe harness" section in `references/run-harness-examples/fastapi-web-server.md` for the pattern.
 
 3. **Collect the response** — the app's output becomes eval_output, along with any side-effects captured by mock objects.