Skip to content

Commit 8018c4b

Browse files
yiouliCopilot
andauthored
Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
1 parent 77407e0 commit 8018c4b

File tree

3 files changed

+4
-4
lines changed

3 files changed

+4
-4
lines changed

skills/eval-driven-dev/SKILL.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -167,7 +167,7 @@ Run `pixie test` (without a path argument) to execute the full evaluation pipeli
167167

168168
## Web Server Management
169169

170-
pixie-qa runs a web server in the background for displaying context, traces, and eval results to the user. It's automatically started by the setup script, and need to be explicitly cleaned up when display is no longer needed.
170+
pixie-qa runs a web server in the background for displaying context, traces, and eval results to the user. It's automatically started by the setup script, and needs to be explicitly cleaned up when display is no longer needed.
171171

172172
When the user is done with the eval-driven-dev workflow, inform them the web server is still running and you can clean it up with the following command:
173173

skills/eval-driven-dev/references/2-instrument-and-observe.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Step 2: Instrument and observe a real run
22

3-
> For a quick lookup of imports, CLI commands, and key concepts, see `quick-reference.md`.
3+
> For a quick lookup of imports, CLI commands, and key concepts, see `instrumentation-api.md`.
44
55
**Why this step**: You need to see the actual data flowing through the app before you can build anything. This step produces a reference trace that shows the exact data shapes you'll use for datasets and evaluators.
66

skills/eval-driven-dev/references/3-run-harness.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,9 +21,9 @@ run_app(eval_input) → eval_output
2121

2222
**Starting web servers**: If you need to start a server process (for the subprocess approach), always use `run-with-timeout.sh` to start it in the background — never use bare `&` or `nohup` directly. See the FastAPI example file for the pattern.
2323

24-
**TestClient + database gotcha**: If the app manages DB connections in its FastAPI lifespan (common pattern: `_conn = get_connection()` in startup, `_conn.close()` in shutdown), the TestClient's lifespan teardown will close your mock connection. Read the "Gotcha: FastAPI TestClient + Database Connections" section below for the fix (wrap the connection to prevent lifespan from closing it).
24+
**TestClient + database gotcha**: If the app manages DB connections in its FastAPI lifespan (common pattern: `_conn = get_connection()` in startup, `_conn.close()` in shutdown), the TestClient's lifespan teardown will close your mock connection. Read the "Gotcha: FastAPI TestClient + Database Connections" section in `references/run-harness-examples/fastapi-web-server.md` for the fix (wrap the connection to prevent lifespan from closing it).
2525

26-
**Concurrency — critical**: `assert_dataset_pass` calls `run_app` concurrently for multiple dataset items. Your harness **must be concurrency-safe**. Do NOT wrap the entire function in a `threading.Lock()` — this serializes all runs and makes tests extremely slow. Instead, initialize the app (TestClient, DB, services) **once at module level** and let each `run_app` call reuse the shared client. The app's per-session state (keyed by call_sid, session_id, etc.) provides natural isolation. Read the "Concurrency-safe harness" section below for the pattern.
26+
**Concurrency — critical**: `assert_dataset_pass` calls `run_app` concurrently for multiple dataset items. Your harness **must be concurrency-safe**. Do NOT wrap the entire function in a `threading.Lock()` — this serializes all runs and makes tests extremely slow. Instead, initialize the app (TestClient, DB, services) **once at module level** and let each `run_app` call reuse the shared client. The app's per-session state (keyed by call_sid, session_id, etc.) provides natural isolation. Read the "Concurrency-safe harness" section in `references/run-harness-examples/fastapi-web-server.md` for the pattern.
2727

2828
3. **Collect the response** — the app's output becomes eval_output, along with any side-effects captured by mock objects.
2929

0 commit comments

Comments
 (0)