feat: improve tip generation prompt with richer guidance (#124)

jayaramkr · JAYARAM RADHAKRISHNAN · web-flow · commit d373e7ebb00c · 2026-04-03T10:22:28.000-04:00
* feat: improve tip generation prompt with richer guidance

Restructures the prompt to produce more specific, actionable tips by
adding explicit categories (strategy/recovery/optimization), step
instructions, trigger conditions, and domain-specific pattern hints
(API discovery, pagination, auth, error handling).

* fix: add optional implementation_steps to Tip model and prompt

Aligns the Tip schema with the generate_tips prompt which already
instructed the LLM to produce implementation steps. Field is optional
(default empty list) for backward compatibility with existing stored
entities and consumers.

* feat: propagate implementation_steps through tip storage and clustering

Store implementation_steps in entity metadata across all three storage
paths (phoenix sync, MCP save_trajectory, consolidation). Pass it
through combine_cluster so the LLM sees prior steps when consolidating,
and update combine_tips.jinja2 to render and emit the field.

* fix: normalize implementation_steps to list[str] in combine_cluster

Handles legacy entities where implementation_steps may be None, a
bare string, or a non-list type stored in metadata.

* fix: clarify task status context in tip generation prompt

Replace placeholder 'Status: UNKNOWN' with an accurate description of
the evaluation context — no ground truth or user feedback, only the
agent's self-evaluation in the trajectory. Soften success/failure
conditionals to reflect this uncertainty.

---------

Co-authored-by: JAYARAM RADHAKRISHNAN &lt;jayaramkr@us.ibm.com&gt;
diff --git a/evolve/frontend/client/evolve_client.py b/evolve/frontend/client/evolve_client.py
@@ -143,6 +143,7 @@ def consolidate_tips(self, namespace_id: str, threshold: float | None = None) ->
                             "rationale": tip.rationale,
                             "category": tip.category,
                             "trigger": tip.trigger,
+                            "implementation_steps": tip.implementation_steps,
                         },
                     )
                     for tip in consolidated_tips
diff --git a/evolve/frontend/mcp/mcp_server.py b/evolve/frontend/mcp/mcp_server.py
@@ -215,6 +215,7 @@ def save_trajectory(trajectory_data: str, task_id: str | None = None) -> list[Re
                         "category": tip.category,
                         "rationale": tip.rationale,
                         "trigger": tip.trigger,
+                        "implementation_steps": tip.implementation_steps,
                         "task_description": result.task_description,
                         "source_task_id": task_id,
                         "creation_mode": "auto-mcp",
diff --git a/evolve/llm/tips/clustering.py b/evolve/llm/tips/clustering.py
@@ -157,12 +157,22 @@ def combine_cluster(entities: list[RecordedEntity]) -> list[Tip]:
         dict.fromkeys((e.metadata or {}).get("task_description", "") for e in entities if (e.metadata or {}).get("task_description"))
     )
 
+    def _normalize_steps(raw: object) -> list[str]:
+        if raw is None or raw == []:
+            return []
+        if isinstance(raw, str):
+            return [raw]
+        if isinstance(raw, list):
+            return [str(x) for x in raw]
+        return [str(raw)]
+
     tips = [
         {
             "content": str(e.content),
             "rationale": (e.metadata or {}).get("rationale", ""),
             "category": (e.metadata or {}).get("category", "strategy"),
             "trigger": (e.metadata or {}).get("trigger", ""),
+            "implementation_steps": _normalize_steps((e.metadata or {}).get("implementation_steps")),
         }
         for e in entities
     ]
diff --git a/evolve/llm/tips/prompts/combine_tips.jinja2 b/evolve/llm/tips/prompts/combine_tips.jinja2
@@ -13,6 +13,7 @@ These guidelines came from tasks like:
 - **Rationale:** {{ tip.rationale }}
 - **Category:** {{ tip.category }}
 - **Trigger:** {{ tip.trigger }}
+{% if tip.implementation_steps %}- **Implementation Steps:** {{ tip.implementation_steps | join('; ') }}{% endif %}
 
 {% endfor %}
 
@@ -35,7 +36,8 @@ Combine the above guidelines into a smaller set of HIGH-QUALITY, CONSOLIDATED, N
             "content": "Clear, actionable tip",
             "rationale": "Why this tip helps",
             "category": "strategy|recovery|optimization",
-            "trigger": "When to apply this tip"
+            "trigger": "When to apply this tip",
+            "implementation_steps": ["step 1", "step 2"]
         }
     ]
 }
diff --git a/evolve/llm/tips/prompts/generate_tips.jinja2 b/evolve/llm/tips/prompts/generate_tips.jinja2
@@ -1,23 +1,33 @@
-You are analyzing an AI agent's execution trajectory to extract actionable tips.
+Extract actionable, relevant tips from this trajectory that would help an AI agent perform similar tasks better in the future.
 
 # Task Information
 **Task:** {{task_instruction}}
-**Status:** UNKNOWN
+**Task Status:** There is no evaluation of the task's trajectory or output against any ground truth. There is also no user feedback to the AI agent. But the trajectory may contain the agent's self-evaluation of whether the task succeeded or failed.
 **Steps Taken:** {{num_steps}}
 
 # Agent Trajectory
 {{trajectory_summary}}
 
-# Your Task
-Extract 3-5 actionable tips from this trajectory that would help AI agents perform similar tasks better.
+**IMPORTANT TO REMEMBER:**
+1. Only generate tips if they are truly relevant and actionable
+2. Tips should be specific to patterns observed in this trajectory
+3. Include both positive patterns (what worked) and negative patterns (what to avoid)
+4. Each tip should have:
+   - A clear, concise description (content)
+   - The purpose/benefit of following it
+   - The category: "strategy", "recovery", or "optimization"
+   - Specific steps to implement the tip
+   - A trigger condition (when to apply this tip)
 
-**Guidelines:**
-1. Focus on patterns that worked or mistakes that were made
-2. Be specific to what you observed in this trajectory
-3. Each tip should have:
-   - Clear description of what to do (or avoid)
-   - Why it matters
-   - When to apply it
+5. If the task seems to have succeeded, focus on the successful strategies used
+6. If the task seems to have failed, focus on what went wrong and how to prevent/recover from it
+7. Do not generate generic tips - be specific to this task execution
+8. Look for patterns in how the agent:
+   - Discovered and used APIs
+   - Handled authentication and credentials
+   - Iterated through results (pagination)
+   - Structured its approach to the problem
+   - Handled errors or unexpected responses
 
 {% if not constrained_decoding_supported %}
 **Output Format (JSON):**
@@ -28,11 +38,15 @@ Extract 3-5 actionable tips from this trajectory that would help AI agents perfo
             "content": "Clear, actionable tip",
             "rationale": "Why this tip helps",
             "category": "strategy|recovery|optimization",
-            "trigger": "When to apply this tip"
+            "trigger": "When to apply this tip",
+            "implementation_steps": ["step 1", "step 2"]
         }
     ]
 }
 ```
 
 Generate tips now. Return ONLY the JSON, no other text.
-{% endif %}
+{% endif %}
+
+
+
diff --git a/evolve/schema/tips.py b/evolve/schema/tips.py
@@ -10,6 +10,7 @@ class Tip(BaseModel):
     rationale: str = Field(description="Why this tip helps")
     category: Literal["strategy", "recovery", "optimization"]
     trigger: str = Field(description="When to apply this tip")
+    implementation_steps: list[str] = Field(default_factory=list, description="Specific steps to implement this tip")
 
 
 class TipGenerationResponse(BaseModel):
diff --git a/evolve/sync/phoenix_sync.py b/evolve/sync/phoenix_sync.py
@@ -480,6 +480,7 @@ def _process_trajectory(self, trajectory: dict) -> int:
                         "category": tip.category,
                         "rationale": tip.rationale,
                         "trigger": tip.trigger,
+                        "implementation_steps": tip.implementation_steps,
                         "source_task_id": trajectory["trace_id"],
                         "source_span_id": trajectory["span_id"],
                         "task_description": result.task_description,

Original file line number	Diff line number	Diff line change
`@@ -143,6 +143,7 @@ def consolidate_tips(self, namespace_id: str, threshold: float \| None = None) ->`
`143`	`143`	`"rationale": tip.rationale,`
`144`	`144`	`"category": tip.category,`
`145`	`145`	`"trigger": tip.trigger,`
	`146`	`+ "implementation_steps": tip.implementation_steps,`
`146`	`147`	`},`
`147`	`148`	`)`
`148`	`149`	`for tip in consolidated_tips`
Original file line number	Diff line number	Diff line change
`@@ -157,12 +157,22 @@ def combine_cluster(entities: list[RecordedEntity]) -> list[Tip]:`
`157`	`157`	`dict.fromkeys((e.metadata or {}).get("task_description", "") for e in entities if (e.metadata or {}).get("task_description"))`
`158`	`158`	`)`
`159`	`159`
	`160`	`+ def _normalize_steps(raw: object) -> list[str]:`
	`161`	`+ if raw is None or raw == []:`
	`162`	`+ return []`
	`163`	`+ if isinstance(raw, str):`
	`164`	`+ return [raw]`
	`165`	`+ if isinstance(raw, list):`
	`166`	`+ return [str(x) for x in raw]`
	`167`	`+ return [str(raw)]`
	`168`	`+`
`160`	`169`	`tips = [`
`161`	`170`	`{`
`162`	`171`	`"content": str(e.content),`
`163`	`172`	`"rationale": (e.metadata or {}).get("rationale", ""),`
`164`	`173`	`"category": (e.metadata or {}).get("category", "strategy"),`
`165`	`174`	`"trigger": (e.metadata or {}).get("trigger", ""),`
	`175`	`+ "implementation_steps": _normalize_steps((e.metadata or {}).get("implementation_steps")),`
`166`	`176`	`}`
`167`	`177`	`for e in entities`
`168`	`178`	`]`
Original file line number	Diff line number	Diff line change
`@@ -13,6 +13,7 @@ These guidelines came from tasks like:`
`13`	`13`	`- Rationale: {{ tip.rationale }}`
`14`	`14`	`- Category: {{ tip.category }}`
`15`	`15`	`- Trigger: {{ tip.trigger }}`
	`16`	`+{% if tip.implementation_steps %}- Implementation Steps: {{ tip.implementation_steps \| join('; ') }}{% endif %}`
`16`	`17`
`17`	`18`	`{% endfor %}`
`18`	`19`
`@@ -35,7 +36,8 @@ Combine the above guidelines into a smaller set of HIGH-QUALITY, CONSOLIDATED, N`
`35`	`36`	`"content": "Clear, actionable tip",`
`36`	`37`	`"rationale": "Why this tip helps",`
`37`	`38`	`"category": "strategy\|recovery\|optimization",`
`38`		`- "trigger": "When to apply this tip"`
	`39`	`+ "trigger": "When to apply this tip",`
	`40`	`+ "implementation_steps": ["step 1", "step 2"]`
`39`	`41`	`}`
`40`	`42`	`]`
`41`	`43`	`}`