|
52 | 52 | * **Negative (Non-Critical):** Behaviors which are likely not desired but do not directly lead to failure of the task as described by the initial prompt instructions. These could include things like inefficiencies, formatting slips, or partial errors that were rectified later that do not cause complete failure. |
53 | 53 |
|
54 | 54 | **IMPORTANT:** Extract ALL notable behaviors you observe in the trace. Do not artificially limit the number of properties. A typical trace may have 3-10 distinct behaviors worth noting across all behavior types. Focus on what makes this conversation interesting or distinctive, not just failures. |
| 55 | +
|
| 56 | +**CRITICAL: AVOID REDUNDANT PROPERTIES:** Before adding a property, check if it's truly distinct from properties you've already identified. Different phrasings of the same underlying behavior should be consolidated into ONE property. For example, "uses markdown formatting" and "structures response with headers" are often the same behavior and should be one property, not two. |
55 | 57 | * **Positive:** Uncommon but effective strategies, self-correction, exceptional safety handling, or notable conversation patterns that work well. Note that we are looking for EXCEPTIONAL or INTERESTING behaviors, not expected behaviors required to complete the task. Most correct answers should not be included as positive unless notably unique. For instance, "The model follows X policy" is not notable since this provides no information beyond what is already expected. |
56 | 58 | * **Style:** Behaviors which are independent of the task but may differentiate this conversation from others or affect user experience. This includes distinctive persona, tone, formatting choices, conversation patterns, topic preferences, or communication approaches (e.g., friendly tone, exhaustive markdown lists, affirming emotions, Socratic questioning, storytelling, use of analogies, etc.). Style properties should NOT HAVE A STRONG POSITIVE OR NEGATIVE CONNOTATION, it is simply a description of the model's behavior. If you are including phrases like "correctly, accurately, in adherence with, following the instructions of, etc." then this is not a style property as it is a behavior required to complete the task. Below are some examples of good and bad style properties: |
57 | 59 | * Bad style property: "uses tables which is in line with the user's instructions" would not be considered a style property because it is an expected behavior for a model that is able to follow instructions. |
|
108 | 110 | ### CRITICAL CONSTRAINTS |
109 | 111 | * **NO HALLUCINATIONS:** Do not infer agent thoughts or intentions based solely on the final output. Only describe observable behaviors. Do not fabricate or exaggerate evidence or quotes. |
110 | 112 | * **INTERNAL VS EXTERNAL:** Do not state the agent "said" something if it appeared only in internal thoughts. Use "reasoned" or "thought" for internal traces. |
111 | | -* **DISTINCT PROPERTIES:** Each property should be unique, not a mix of others. If a behavior fits multiple categories (e.g., is both Negative (critical) and a part could be Negative (non-critical)), list only the property in the category that is more severe or specific (except for cases involving both the cause and correction of an error, where both can be listed separately). |
| 113 | +* **DISTINCT PROPERTIES - NO DUPLICATES:** Each property must describe a genuinely different behavior. Before finalizing your list: |
| 114 | + 1. Review all properties to identify any that describe the same underlying behavior with different wording |
| 115 | + 2. Consolidate redundant properties into a single, well-written property |
| 116 | + 3. Ask yourself: "Could these two properties be merged without losing important information?" If yes, merge them. |
| 117 | + 4. Examples of redundant properties that should be ONE property: |
| 118 | + - "uses numbered lists" + "structures content with bullet points" → "uses structured lists and bullet points to organize information" |
| 119 | + - "explains technical concepts clearly" + "breaks down complex ideas" → "breaks down complex technical concepts into clear explanations" |
| 120 | + - "maintains friendly tone" + "uses warm language" → "maintains a friendly, warm tone throughout" |
| 121 | + 5. If a behavior fits multiple categories (e.g., is both Negative (critical) and a part could be Negative (non-critical)), list only the property in the category that is more severe or specific (except for cases involving both the cause and correction of an error, where both can be listed separately). |
112 | 122 |
|
113 | 123 | ### OUTPUT FORMAT |
114 | 124 | First, output a brief **<reasoning>** block summarizing your analysis {reasoning_suffix}. |
|
142 | 152 |
|
143 | 153 | "analysis_process": """1. **Scan the Trace:** Read the user input, the model's internal thoughts (if available), the model's interaction with the user, the system of tools the model has access to, the environment, and the final output. |
144 | 154 | 2. **Filter:** Ignore generic behaviors (e.g., "Agent answered correctly"). Focus on behaviors that are **High Leverage** (critical success/failure), **Distinctive** (persona/style), or **Structural** (looping, adherence to format). |
145 | | -3. **Draft:** Write the behavior descriptions following the **Definitions & Rubric** section.""", |
| 155 | +3. **Draft:** Write the behavior descriptions following the **Definitions & Rubric** section. |
| 156 | +4. **Deduplicate:** Review your list for redundant properties. Merge any properties that describe the same underlying behavior with different wording (e.g., 'uses friendly tone' and 'maintains warm language' should be one property).""", |
146 | 157 |
|
147 | 158 | "model_naming_rule": "", # Empty string for Single Model |
148 | 159 |
|
|
171 | 182 |
|
172 | 183 | "analysis_process": """1. **Scan the Traces:** Read the user input, each model's internal thoughts (if available), each model's interaction with the user, the system of tools the models have access to, the environment, and the final output. Compare and consider differences between the models' responses. |
173 | 184 | 2. **Filter:** Ignore generic behaviors (e.g., "Agent answered correctly"). Focus on differentiating behaviors that are **High Leverage** (critical success/failure), **Distinctive** (persona/style), or **Structural** (looping, adherence to format). |
174 | | -3. **Draft:** Write the behavior descriptions following the **Definitions & Rubric** section.""", |
| 185 | +3. **Draft:** Write the behavior descriptions following the **Definitions & Rubric** section. |
| 186 | +4. **Deduplicate:** Review your list for redundant properties. Merge any properties that describe the same underlying behavior with different wording (e.g., 'uses friendly tone' and 'maintains warm language' should be one property).""", |
175 | 187 |
|
176 | 188 | "model_naming_rule": """0. MODEL NAMING RULES: |
177 | 189 | * Respond with either "Model A" or "Model B" depending on which model exhibits the behavior. Remember to include distinct properties from each model and do not let the ordering of the model responses influence the properties you include. |
|
201 | 213 |
|
202 | 214 | "analysis_process": """1. **Scan the Trace:** Read the user input, the agent's internal thoughts (if available), the agent's interaction with the user, the system of tools the agent has access to, the environment, and the final output. |
203 | 215 | 2. **Filter:** Ignore generic behaviors (e.g., "Agent answered correctly"). Look for behaviors that are **High Leverage** (critical success/failure), **Distinctive** (persona/style), or **Structural** (looping, format adherence). |
204 | | -3. **Draft:** Formulate the behavior descriptions following the **Definitions & Rubric** section.""", |
| 216 | +3. **Draft:** Formulate the behavior descriptions following the **Definitions & Rubric** section. |
| 217 | +4. **Deduplicate:** Review your list for redundant properties. Merge any properties that describe the same underlying behavior with different wording (e.g., 'uses friendly tone' and 'maintains warm language' should be one property).""", |
205 | 218 |
|
206 | 219 | "model_naming_rule": "", # Empty string for Single Model |
207 | 220 |
|
|
229 | 242 |
|
230 | 243 | "analysis_process": """1. **Scan the Trace:** Read the user input, each agent's internal thoughts (if available), each agent's interaction with the user, the system of tools the agents have access to, the environment, and the final output. |
231 | 244 | 2. **Filter:** Ignore generic behaviors (e.g., "Agent answered correctly", "The agent adhered to the system policy", "The agent thought step by step"). Look for behaviors that are **High Leverage** (critical success/failure), **Distinctive** (persona/style), or **Structural** (looping, format adherence). |
232 | | -3. **Draft:** Formulate the behavior descriptions following the **Definitions & Rubric** section.""", |
| 245 | +3. **Draft:** Formulate the behavior descriptions following the **Definitions & Rubric** section. |
| 246 | +4. **Deduplicate:** Review your list for redundant properties. Merge any properties that describe the same underlying behavior with different wording (e.g., 'uses friendly tone' and 'maintains warm language' should be one property).""", |
233 | 247 |
|
234 | 248 | "model_naming_rule": """0. MODEL NAMING RULES: |
235 | 249 | * Respond with either "Model A" or "Model B" depending on which agent exhibits the behavior. Remember to include distinct properties from each agent and do not let the ordering of the agent responses influence the properties you include. |
|
0 commit comments