Skip to content

Commit 4089c45

Browse files
committed
Update test workflow to use custom test runner for 'fast' pattern and refine guardrail functionality in Task class
- Changed the test command in the GitHub Actions workflow to run a specific test runner script with a 'fast' pattern. - Introduced guardrail functionality in the Task class to validate task outputs, including error handling and retry logic for guardrail validation failures. - Enhanced the initialization of guardrail parameters and ensured proper type checking for guardrail functions.
1 parent 13503f0 commit 4089c45

14 files changed

Lines changed: 1155 additions & 148 deletions

File tree

.github/workflows/python-package.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -65,4 +65,4 @@ jobs:
6565
# flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
6666
- name: Test with pytest
6767
run: |
68-
cd src/praisonai && python -m pytest
68+
cd src/praisonai && python tests/test_runner.py --pattern fast

src/praisonai-agents/.cursorrules

Lines changed: 1 addition & 143 deletions
Original file line numberDiff line numberDiff line change
@@ -5,146 +5,4 @@
55
5. Make it minimal change as possible
66
6. Firstly try not to make any modification to the existing code as possible
77
7. Only modify the existing code if its highly required, without that if it cant be done, then add new code section.
8-
8. If you are adding new code, make sure to add it in a way that it can be easily integrated with the existing codebase.
9-
10-
11-
Below is a **detailed technical overview** of the issues that have been coming up in the workflow execution (specifically around loops, resetting tasks, and continuing on to subsequent tasks). It includes:
12-
13-
1. **How the workflow currently flows**
14-
2. **What logic marks a task as “completed”**
15-
3. **What issues arose in `loop` tasks**
16-
4. **Why the workflow can end up “stuck” or “looping indefinitely”**
17-
18-
---
19-
20-
## 1. Overall Workflow Flow
21-
22-
### a. Building relationships among tasks
23-
24-
- At startup, the code iterates through all `tasks`.
25-
- For each `task`:
26-
- It looks at `task.next_tasks`, and for each `next_task_name`:
27-
- Finds the corresponding `Task` object
28-
- Appends the current task’s name to the found `next_task`'s `previous_tasks` list.
29-
- This means if Task A has `next_tasks=["B"]`, then Task B’s `previous_tasks` will include `"A"`.
30-
31-
### b. Finding and starting with a “start task”
32-
33-
- The workflow code tries to locate a task with `is_start=True`.
34-
- If no such task is found, it uses the first item in the tasks dictionary instead.
35-
- That “start task” is what the workflow tries to run first.
36-
37-
### c. Execution loop in the method (e.g., `workflow()` or `aworkflow()`)
38-
39-
- There is a `while current_task:` loop that processes tasks in sequence, or conditionally, based on `task.condition`.
40-
- Each time a task runs (if non-loop), it yields the `task_id` or triggers an agent to run. Once the agent finishes (with or without a final result), the workflow picks up again to see what to do next:
41-
- If the `task` is a `loop` type, the code tries to create or manage sub-tasks for each row/line in an “input_file.”
42-
- If the `task` is a normal (“decision” or “task” or “some-other-type”), it just executes once, sets `status="completed"`, and the code moves on.
43-
44-
### d. Condition-based branching
45-
46-
- If a task has a result that includes a “decision” (like `{"decision":"more"}` or `"done"`), the code checks `task.condition`. For example:
47-
```python
48-
condition = {
49-
"more": "generate_task",
50-
"done": "evaluate_total_questions"
51-
}
52-
```
53-
- If the result’s decision is `"done"`, it jumps to the task named `"evaluate_total_questions"`.
54-
- If the result’s decision is `"more"`, it jumps right back to `"generate_task"`.
55-
- If no condition matches, it can fallback to the first item in `task.next_tasks`.
56-
57-
### e. Marking tasks as “completed”
58-
59-
- After a task’s execution (like a typical “non-loop” task), the code sets `task.status = "completed"`.
60-
- Then the code has a snippet that says:
61-
```python
62-
if self.tasks[task_id].status == "completed":
63-
# Possibly reset to "not started" so we can re-run if needed
64-
```
65-
- By default, the system tries to “reset” tasks to `"not started"`, **unless** it is a loop task or a subtask of a loop.
66-
67-
---
68-
69-
## 2. How a “completed” task is marked
70-
71-
Generally, tasks are marked `status="completed"` in two primary ways:
72-
73-
1. **Non-Loop Execution**
74-
A normal task (like a “decision” or “task”) is executed once the code calls the agent, the agent returns a final result, and the system sets `task.status = "completed"`.
75-
2. **Loop Execution**
76-
A loop-type task is *programmatically* set to `status="completed"` when all of its sub-tasks have finished. That is:
77-
- The code checks: “Have all child tasks of this loop finished?”
78-
- If `True`, the loop task is set to `completed`.
79-
80-
---
81-
82-
## 3. Issues Specifically in `loop` Tasks
83-
84-
### a. Re-Entering the Loop
85-
86-
- Before, the same snippet that “resets completed tasks to ‘not started’ so they can re-run if needed” **also** tried to reset loop tasks or their subtasks.
87-
- If a loop task got reset to `"not started"`, the code would eventually pick it back up again, leading to repeated creation of sub-tasks (or repeated attempts to re-run them).
88-
- This caused an **infinite loop** or repeating the same steps in the workflow, never truly exiting the loop stage.
89-
90-
### b. Subtasks Not Marked or Overwritten
91-
92-
- Another tricky scenario: If sub-tasks themselves got reset, the parent loop would see them as “not started” again, and might wait for them to “complete,” or might re-run them. That can lead to indefinite re-running of sub-tasks.
93-
94-
### c. Not Proceeding to Next (e.g., “upload_to_huggingface”)
95-
96-
- If the loop kept “restarting,” the workflow never ended up hitting the next tasks. For example, if your workflow is:
97-
1. `generate_task`
98-
2. `evaluate_total_questions`
99-
3. `generate_cot` (loop)
100-
4. `upload_to_huggingface`
101-
- The system might get stuck in step #3 indefinitely (the sub-tasks keep getting reset, so it never actually transitions to step #4).
102-
103-
### d. Condition Logic vs. Next Tasks
104-
105-
- Another subtlety: If the loop tasks had a `condition` that pointed them back to a prior step, it might cause unintentional re-entry. Typically, you only want loop tasks to proceed once, **unless** you explicitly want to re-visit. But if it’s a data ingestion process, you usually want to do it once, then move on to the next step.
106-
107-
---
108-
109-
## 4. Why the System Can End Up Stuck or “Looping Indefinitely”
110-
111-
1. **Reset Mechanism**
112-
- The code tries to “reset tasks to ‘not started’ once they complete,” so they can be re-run in some dynamic multi-run scenario.
113-
- But that same logic can cause loop tasks to revert back to “not started” the moment they end. The system sees “Oh, a task is ‘not started’? Let’s run it!” and you’re in a cycle.
114-
115-
2. **No Condition for Exit**
116-
- If the loop has a condition that leads back to a prior step (like `"more" -> generate_cot`), it can keep re-running.
117-
118-
3. **Subtasks Not Marked**
119-
- If the subtask or the loop tries to “reactivate” each other, it never exits.
120-
121-
---
122-
123-
## Summary of the “Core Problem”
124-
125-
1. We want to keep the resetting mechanism for **non-loop tasks** – because in some advanced workflows, we like re-running them from a different path or after some condition.
126-
2. But we want **loop tasks** to remain `"completed"` once all sub-tasks are done, so the code can seamlessly proceed to the next major step.
127-
3. Before the fix, loop tasks or their sub-tasks got reset. This triggered the system to re-enter the loop, re-run the sub-tasks, etc., causing an infinite loop and preventing the workflow from reaching tasks like “upload_to_huggingface.”
128-
129-
---
130-
131-
## Technical Highlights to Pass On
132-
133-
- **In the reset snippet**:
134-
```python
135-
if self.tasks[task_id].status == "completed":
136-
# never reset if loop or subtask-of-loop
137-
# else reset to "not started"
138-
```
139-
This is crucial to skipping re-runs on loop tasks.
140-
- **Ensure** that once a loop’s sub-tasks are all “completed,” the loop’s status is set to “completed,” and it transitions to the next major tasks (like `upload_to_huggingface`).
141-
- **Check** if the loop’s condition is correct. If you want a single pass, do not implement a condition that leads back to the same loop.
142-
- Also, you can check you do not have “overlapping conditions” that cause re-entry.
143-
144-
---
145-
146-
### Conclusion
147-
148-
**Hence,** the main challenge is that the reset logic (meant to let normal tasks be re-run) conflicts with a loop task’s one-pass usage. Once you avoid resetting the loop tasks or sub-tasks, you can finish them once, mark them “completed,” and properly proceed to the next stage.
149-
150-
Dont remove any logging or debug statements, as it will help you to understand the flow of the code.
8+
8. If you are adding new code, make sure to add it in a way that it can be easily integrated with the existing codebase.

src/praisonai-agents/CLAUDE.md

Lines changed: 204 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,204 @@
1+
# CLAUDE.md
2+
3+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4+
5+
## Project Overview
6+
7+
PraisonAI Agents is a hierarchical AI agent framework for completing complex tasks with self-reflection capabilities. It supports multi-agent collaboration, tool integration, and various execution patterns (sequential, hierarchical, parallel).
8+
9+
## Development Commands
10+
11+
### Installation and Setup
12+
```bash
13+
# Install core package
14+
pip install -e .
15+
16+
# Install with specific features
17+
pip install -e .[all] # All features
18+
pip install -e .[memory] # Memory capabilities
19+
pip install -e .[knowledge] # Document processing
20+
pip install -e .[mcp] # MCP server support
21+
pip install -e .[llm] # Extended LLM support
22+
pip install -e .[api] # API server capabilities
23+
```
24+
25+
### Testing
26+
```bash
27+
# Run individual test examples (no formal test runner configured)
28+
python tests/basic-agents.py
29+
python tests/async_example.py
30+
python tests/knowledge-agents.py
31+
32+
# Test specific features
33+
python tests/mcp-agents.py # MCP integration
34+
python tests/memory_example.py # Memory functionality
35+
python tests/tools_example.py # Tool system
36+
```
37+
38+
### Running Examples
39+
```bash
40+
# Basic agent usage
41+
python tests/single-agent.py
42+
43+
# Multi-agent workflows
44+
python tests/multi-agents-api.py
45+
46+
# Async operations
47+
python tests/async_example_full.py
48+
49+
# MCP server examples
50+
python tests/mcp-sse-direct-server.py # Start MCP server
51+
python tests/mcp-sse-direct-client.py # Connect to server
52+
```
53+
54+
## Core Architecture
55+
56+
### Agent System (`praisonaiagents/agent/`)
57+
- **Agent**: Core agent class with LLM integration, tool calling, and self-reflection
58+
- **ImageAgent**: Specialized multimodal agent for image processing
59+
- Self-reflection with configurable min/max iterations (default: 1-3)
60+
- Delegation support for hierarchical agent structures
61+
62+
### Multi-Agent Orchestration (`praisonaiagents/agents/`)
63+
- **PraisonAIAgents**: Main orchestrator for managing multiple agents and tasks
64+
- **AutoAgents**: Automatic agent creation and management
65+
- Process types: `sequential`, `hierarchical`, `parallel`
66+
- Context passing between agents and task dependency management
67+
68+
### Task System (`praisonaiagents/task/`)
69+
- **Task**: Core task definition with context, callbacks, and output specifications
70+
- Supports file output, JSON/Pydantic structured output, async execution
71+
- Conditional logic with `condition` parameter for task flow control
72+
- Context passing via `context` parameter for task dependencies
73+
- **Guardrails**: Built-in validation and safety mechanisms for task outputs
74+
- Function-based guardrails for custom validation logic
75+
- LLM-based guardrails using natural language descriptions
76+
- Automatic retry with configurable `max_retries` parameter
77+
- Compatible with CrewAI guardrail patterns
78+
79+
### LLM Integration (`praisonaiagents/llm/`)
80+
- Unified wrapper for multiple LLM providers via LiteLLM
81+
- Supports OpenAI, Anthropic, Gemini, DeepSeek, local models (Ollama)
82+
- Context length management and tool calling capabilities
83+
- Set via `llm` parameter on agents or global `OPENAI_API_KEY`/`ANTHROPIC_API_KEY`
84+
85+
### Tool System (`praisonaiagents/tools/`)
86+
Two implementation patterns:
87+
1. **Function-based**: Simple tools using `@tool` decorator
88+
2. **Class-based**: Complex tools inheriting from `BaseTool`
89+
90+
Built-in tools include: DuckDuckGo search, file operations, calculator, Wikipedia, arXiv, data analysis tools, shell execution.
91+
92+
### Memory & Knowledge Systems
93+
- **Memory** (`praisonaiagents/memory/`): Multi-layered memory with RAG support
94+
- Types: short-term, long-term, entity, user memory
95+
- Providers: ChromaDB, Mem0, custom implementations
96+
- **Knowledge** (`praisonaiagents/knowledge/`): Document processing with chunking
97+
- Chunking strategies via `chonkie` library
98+
- Embedding and retrieval capabilities
99+
100+
### MCP (Model Context Protocol) Integration
101+
- **MCP Server**: Server-side tool protocol for distributed execution
102+
- **SSE Support**: Server-sent events for real-time communication
103+
- Tool discovery and dynamic registration
104+
105+
## Development Patterns
106+
107+
### Agent Creation
108+
```python
109+
agent = Agent(
110+
name="Agent Name",
111+
role="Agent Role",
112+
goal="Agent Goal",
113+
backstory="Agent Background",
114+
llm="gpt-4o-mini", # or other LLM
115+
self_reflect=True, # Enable self-reflection
116+
min_reflect=1, # Minimum reflection iterations
117+
max_reflect=3, # Maximum reflection iterations
118+
tools=[tool1, tool2] # Optional tools
119+
)
120+
```
121+
122+
### Task Definition
123+
```python
124+
task = Task(
125+
name="task_name",
126+
description="Task description",
127+
expected_output="Expected output format",
128+
agent=agent,
129+
context=[previous_task], # Task dependencies
130+
output_pydantic=ResponseModel, # Structured output
131+
condition="condition_function" # Conditional execution
132+
)
133+
```
134+
135+
### Guardrails Usage
136+
```python
137+
from typing import Tuple, Any
138+
139+
# Function-based guardrail
140+
def validate_output(task_output: TaskOutput) -> Tuple[bool, Any]:
141+
"""Custom validation function."""
142+
if "error" in task_output.raw.lower():
143+
return False, "Output contains errors"
144+
if len(task_output.raw) < 10:
145+
return False, "Output is too short"
146+
return True, task_output
147+
148+
task = Task(
149+
description="Write a professional email",
150+
expected_output="A well-formatted email",
151+
agent=agent,
152+
guardrail=validate_output, # Function-based guardrail
153+
max_retries=3 # Retry up to 3 times if guardrail fails
154+
)
155+
156+
# LLM-based guardrail
157+
task = Task(
158+
description="Generate marketing copy",
159+
expected_output="Professional marketing content",
160+
agent=agent,
161+
guardrail="Ensure the content is professional, engaging, and free of errors", # String description
162+
max_retries=2
163+
)
164+
```
165+
166+
### Multi-Agent Workflow
167+
```python
168+
workflow = PraisonAIAgents(
169+
agents=[agent1, agent2],
170+
tasks=[task1, task2],
171+
process="sequential", # or "hierarchical", "parallel"
172+
verbose=True,
173+
manager_agent=manager_agent # For hierarchical process
174+
)
175+
result = workflow.start()
176+
```
177+
178+
### Async Support
179+
All major components support async execution:
180+
```python
181+
result = await workflow.astart()
182+
result = await agent.aexecute(task)
183+
```
184+
185+
## Key Dependencies
186+
187+
- **Core**: `pydantic`, `rich`, `openai`, `mcp`
188+
- **Memory**: `chromadb`, `mem0ai`
189+
- **Knowledge**: `markitdown`, `chonkie`
190+
- **LLM**: `litellm` for unified provider access
191+
- **API**: `fastapi`, `uvicorn` for server capabilities
192+
193+
## Error Handling
194+
195+
- Global error logging via `error_logs` list
196+
- Callback system for real-time error reporting
197+
- Context length exception handling with automatic retry
198+
- Graceful degradation for optional dependencies
199+
200+
## Testing Strategy
201+
202+
The project uses example-driven testing with 100+ test files in `tests/` directory. Each test file demonstrates specific usage patterns and serves as both test and documentation. Run individual examples to test functionality rather than using a formal test runner.
203+
204+
Use conda activate praisonai-agents to activate the environment.
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
#!/usr/bin/env python3
2+
3+
from typing import Tuple, Any
4+
from praisonaiagents import Agent, Task, TaskOutput
5+
6+
7+
def validate_content(task_output: TaskOutput) -> Tuple[bool, Any]:
8+
if len(task_output.raw) < 50:
9+
return False, "Content too short"
10+
return True, task_output
11+
12+
13+
def main():
14+
agent = Agent(
15+
name="Writer",
16+
guardrail=validate_content
17+
)
18+
19+
result = agent.start("Write a welcome message with 4 words")
20+
print(result)
21+
22+
23+
if __name__ == "__main__":
24+
main()

0 commit comments

Comments
 (0)