|
| 1 | +# Step 1b: Processing Stack & Data Flow — DAG Artifact |
| 2 | + |
| 3 | +Map the complete data flow through the application by producing a **structured DAG JSON file** that represents every important node in the processing pipeline. |
| 4 | + |
| 5 | +--- |
| 6 | + |
| 7 | +## What to investigate |
| 8 | + |
| 9 | +### 1. Find where the LLM provider client is called |
| 10 | + |
| 11 | +Locate every place in the codebase where an LLM provider client is invoked (e.g., `openai.ChatCompletion.create()`, `client.chat.completions.create()`, `anthropic.messages.create()`). These are the anchor points for your analysis. For each LLM call site, record: |
| 12 | + |
| 13 | +- The file and function where the call lives |
| 14 | +- Which LLM provider/client is used |
| 15 | +- The exact arguments being passed (model, messages, tools, etc.) |
| 16 | + |
| 17 | +### 2. Find the common ancestor entry point |
| 18 | + |
| 19 | +Identify the single function that is the common ancestor of all LLM calls — the application's entry point for a single user request. This becomes the **root** of your DAG. |
| 20 | + |
| 21 | +### 3. Track backwards: external data dependencies flowing IN |
| 22 | + |
| 23 | +Starting from each LLM call site, trace **backwards** through the code to find every piece of data that feeds into the LLM prompt: |
| 24 | + |
| 25 | +- **Application inputs**: user messages, queries, uploaded files, config |
| 26 | +- **External dependency data**: database lookups (Redis, Postgres), retrieved context (RAG), cache reads, third-party API responses |
| 27 | +- **In-code data**: system prompts, tool definitions, prompt-building logic |
| 28 | + |
| 29 | +### 4. Track forwards: external side-effects flowing OUT |
| 30 | + |
| 31 | +Starting from each LLM call site, trace **forwards** to find every side-effect: database writes, API calls, messages sent, file writes. |
| 32 | + |
| 33 | +### 5. Identify intermediate states |
| 34 | + |
| 35 | +Along the paths between input and output, identify intermediate states needed for evaluation: tool call decisions, routing/handoff decisions, retrieval results, branching logic. |
| 36 | + |
| 37 | +### 6. Identify testability seams |
| 38 | + |
| 39 | +Look for abstract base classes, protocols, or constructor-injected backends. These are testability seams — you'll create mock implementations of these interfaces. If there's no clean interface, you'll use `unittest.mock.patch` at the module boundary. |
| 40 | + |
| 41 | +--- |
| 42 | + |
| 43 | +## Output: `pixie_qa/02-data-flow.json` |
| 44 | + |
| 45 | +**Write a JSON file** (not markdown) containing a flat array of DAG nodes. Each node represents a significant point in the processing pipeline. |
| 46 | + |
| 47 | +### Node schema |
| 48 | + |
| 49 | +Each node is a JSON object with these fields: |
| 50 | + |
| 51 | +| Field | Type | Required | Description | |
| 52 | +| -------------- | -------------- | -------- | ------------------------------------------------------------------------------------------------------------------ | |
| 53 | +| `name` | string | Yes | Unique, meaningful lower_snake_case node name (for example, `handle_turn`). This is the node identity. | |
| 54 | +| `code_pointer` | string | Yes | **Absolute** file path with function/method name, optionally with line range. See format below. | |
| 55 | +| `description` | string | Yes | What this node does and why it matters for evaluation. | |
| 56 | +| `parent` | string or null | No | Parent node name (`null` or omitted for root). | |
| 57 | +| `is_llm_call` | boolean | No | Set `true` only if the node represents an LLM provider call. Defaults to `false` when omitted. | |
| 58 | +| `metadata` | object | No | Additional info: `mock_strategy`, `data_shape`, `credentials_needed`, `eval_relevant`, external system notes, etc. | |
| 59 | + |
| 60 | +### About `is_llm_call` |
| 61 | + |
| 62 | +- Use `is_llm_call: true` for nodes that represent real LLM provider spans. |
| 63 | +- Leave it omitted (or `false`) for all other nodes. |
| 64 | + |
| 65 | +### `code_pointer` format |
| 66 | + |
| 67 | +The `code_pointer` field uses **absolute file paths** with a symbol name, and an optional line number range: |
| 68 | + |
| 69 | +- `<absolute_file_path>:<symbol>` — points to a whole function or method. Use this when the entire function represents a single node in the DAG (most common case). |
| 70 | +- `<absolute_file_path>:<symbol>:<start_line>:<end_line>` — points to a specific line range within a function. Use this when the function contains an **important intermediate state** — a code fragment that transforms some input into an output that matters for evaluation, but the fragment is embedded inside a larger function rather than being its own function. |
| 71 | + |
| 72 | +**When to use a line range (intermediate states):** |
| 73 | + |
| 74 | +Some functions do multiple important things sequentially. If one of those things produces an intermediate state that your evaluators need to see (e.g., a routing decision, a context assembly step, a tool-call dispatch), but it's not factored into its own function, use a line range to identify that specific fragment. The line range marks the input → output boundary of that intermediate state within the larger function. |
| 75 | + |
| 76 | +Examples of intermediate states that warrant a line range: |
| 77 | + |
| 78 | +- **Routing decision**: lines 51–71 of `main()` decide which agent to hand off to based on user intent — the input is the user message, the output is the selected agent |
| 79 | +- **Context assembly**: lines 30–45 of `handle_request()` gather documents from a vector store and format them into a prompt — the input is the query, the output is the assembled context |
| 80 | +- **Tool dispatch**: lines 80–95 of `process_turn()` parse the LLM's tool-call response and execute the selected tool — the input is the tool-call JSON, the output is the tool result |
| 81 | + |
| 82 | +If the intermediate state is already its own function, just use the function-level `code_pointer` (no line range needed). |
| 83 | + |
| 84 | +Examples: |
| 85 | + |
| 86 | +- `/home/user/myproject/app.py:handle_turn` — whole function |
| 87 | +- `/home/user/myproject/src/agents/llm/openai_llm.py:run_ai_response` — whole function |
| 88 | +- `/home/user/myproject/src/agents/agent.py:main:51:71` — lines 51–71 of `main()`, where a routing decision happens |
| 89 | + |
| 90 | +The symbol can be: |
| 91 | + |
| 92 | +- A function name: `my_func` → matches `def my_func` in the file |
| 93 | +- A class.method: `MyClass.func` → matches `def func` inside `class MyClass` |
| 94 | + |
| 95 | +### Example |
| 96 | + |
| 97 | +```json |
| 98 | +[ |
| 99 | + { |
| 100 | + "name": "handle_turn", |
| 101 | + "code_pointer": "/home/user/myproject/src/agents/agent.py:handle_turn", |
| 102 | + "description": "Entry point for a single user request. Takes user message + conversation history, returns agent response.", |
| 103 | + "parent": null, |
| 104 | + "metadata": { |
| 105 | + "data_shape": { |
| 106 | + "input": "str (user message)", |
| 107 | + "output": "str (response text)" |
| 108 | + } |
| 109 | + } |
| 110 | + }, |
| 111 | + { |
| 112 | + "name": "load_conversation_history", |
| 113 | + "code_pointer": "/home/user/myproject/src/services/redis_client.py:get_history", |
| 114 | + "description": "Reads conversation history from Redis. Returns list of message dicts.", |
| 115 | + "parent": "handle_turn", |
| 116 | + "metadata": { |
| 117 | + "system": "Redis", |
| 118 | + "data_shape": "list[dict] with role/content keys", |
| 119 | + "mock_strategy": "Provide canned history list", |
| 120 | + "credentials_needed": true |
| 121 | + } |
| 122 | + }, |
| 123 | + { |
| 124 | + "name": "run_ai_response", |
| 125 | + "code_pointer": "/home/user/myproject/src/agents/llm/openai_llm.py:run_ai_response", |
| 126 | + "description": "Calls OpenAI API with system prompt + history + user message. Auto-captured by OpenInference.", |
| 127 | + "parent": "handle_turn", |
| 128 | + "is_llm_call": true, |
| 129 | + "metadata": { |
| 130 | + "provider": "OpenAI", |
| 131 | + "model": "gpt-4o-mini" |
| 132 | + } |
| 133 | + }, |
| 134 | + { |
| 135 | + "name": "save_conversation_to_redis", |
| 136 | + "code_pointer": "/home/user/myproject/src/services/redis_client.py:save_history", |
| 137 | + "description": "Writes updated conversation history back to Redis after LLM responds.", |
| 138 | + "parent": "handle_turn", |
| 139 | + "metadata": { |
| 140 | + "system": "Redis", |
| 141 | + "eval_relevant": false, |
| 142 | + "mock_strategy": "Capture written data for assertions" |
| 143 | + } |
| 144 | + } |
| 145 | +] |
| 146 | +``` |
| 147 | + |
| 148 | +### Conditional / optional branches |
| 149 | + |
| 150 | +Some apps have conditional code paths where only one branch executes per request — e.g., `transfer_call` vs `end_call` depending on the outcome. `pixie dag check-trace` (Step 2) validates against a **single** trace, so every DAG node must appear in that trace. |
| 151 | + |
| 152 | +**Rule**: If two or more functions are mutually exclusive (only one runs per request), model them as a **single dispatcher node** that covers the branching logic, not as separate DAG nodes. For example, instead of `end_call_tool` + `transfer_call_tool` as separate nodes, use `execute_tool` pointing at the dispatch function. |
| 153 | + |
| 154 | +If a function only runs under certain conditions but is the sole branch (not mutually exclusive), include it in the DAG — just ensure your reference trace (Step 2) exercises that code path. |
| 155 | + |
| 156 | +### Validation checkpoint |
| 157 | + |
| 158 | +After writing `pixie_qa/02-data-flow.json`, validate the DAG: |
| 159 | + |
| 160 | +```bash |
| 161 | +uv run pixie dag validate pixie_qa/02-data-flow.json |
| 162 | +``` |
| 163 | + |
| 164 | +This command: |
| 165 | + |
| 166 | +1. Checks the JSON structure is valid |
| 167 | +2. Verifies node names use lower_snake_case |
| 168 | +3. Verifies all node names are unique |
| 169 | +4. Verifies all parent references exist |
| 170 | +5. Checks exactly one root node exists (`parent` is null/omitted) |
| 171 | +6. Detects cycles |
| 172 | +7. Verifies code_pointer files exist on disk |
| 173 | +8. Verifies symbols exist in the referenced files |
| 174 | +9. Verifies line number ranges are valid (if present) |
| 175 | +10. **Generates a Mermaid diagram** at `pixie_qa/02-data-flow.md` if validation passes |
| 176 | + |
| 177 | +If validation fails, fix the errors and re-run. The error messages are specific — they tell you exactly which node has the problem and what's wrong. |
| 178 | + |
| 179 | +### Also document testability seams |
| 180 | + |
| 181 | +After the DAG JSON is validated, add a brief **testability seams** section at the bottom of the generated `pixie_qa/02-data-flow.md` (the Mermaid file). For each node that reads from or writes to an external system, note the mock interface: |
| 182 | + |
| 183 | +| Dependency node | Interface / module boundary | Mock strategy | |
| 184 | +| --------------- | --------------------------- | ------------- | |
| 185 | +| ... | ... | ... | |
| 186 | + |
| 187 | +This section supplements the DAG — the DAG captures _what_ the dependencies are, and this table captures _how_ to mock them. |
0 commit comments