Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
bea8ac7
refactor: decompose engine into modules with Pydantic models and Type…
anticomputer Mar 11, 2026
915939a
refactor: wire Pydantic models into parser, runner, and tests
anticomputer Mar 11, 2026
58c39c9
refactor: decompose modules, add type hints and docstrings
anticomputer Mar 11, 2026
8fa7fb4
feat: add responses API support via model_config api_type
anticomputer Mar 11, 2026
7fe48e1
feat: per-model api_type, endpoint, and token overrides
anticomputer Mar 11, 2026
54d5d28
docs: update README and GRAMMAR for responses API and per-model config
anticomputer Mar 11, 2026
6d5f19e
refactor: fix mutable defaults, add __all__ exports and docstrings
anticomputer Mar 11, 2026
85d9e14
fix: lint errors in mcp_servers (missing import, bare except, type co…
anticomputer Mar 11, 2026
476967e
feat: concise error messages, full tracebacks only with --debug
anticomputer Mar 11, 2026
30c9694
feat: session checkpoint/resume with auto-retry
anticomputer Mar 11, 2026
5eaf29c
docs: add session recovery and error output sections to README
anticomputer Mar 11, 2026
2dd674b
test: add comprehensive taskflow exercising all grammar features
anticomputer Mar 11, 2026
336f83d
fix: address code scanning findings from PR review
anticomputer Mar 11, 2026
8efcc25
fix: restore ruff baseline ignores for hatch fmt CI compatibility
anticomputer Mar 11, 2026
f03443c
feat: TASKFLOW_ENV_DENYLIST to filter env vars from MCP subprocesses
anticomputer Mar 11, 2026
3765891
fix: address PR review findings
anticomputer Mar 12, 2026
468f97b
fix: session resume, resource loading, and error path consistency
anticomputer Mar 12, 2026
69fc1e3
fix: address remaining PR feedback, expand test coverage
anticomputer Mar 20, 2026
a147e9e
Address second round of review feedback
anticomputer Mar 20, 2026
8b48991
fix: address human review feedback on PR #166
anticomputer Mar 28, 2026
ed9f9e0
Merge branch 'main' into anticomputer/refactor
anticomputer Apr 1, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
116 changes: 114 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
# GitHub Security Lab Taskflow Agent

The Security Lab Taskflow Agent is an MCP enabled multi-Agent framework.
The Security Lab Taskflow Agent is an MCP-enabled multi-Agent framework for
declarative, YAML-driven agentic workflows.

The Taskflow Agent is built on top of the [OpenAI Agents SDK](https://openai.github.io/openai-agents-python/).
Built on top of the [OpenAI Agents SDK](https://openai.github.io/openai-agents-python/),
it uses [Pydantic](https://docs.pydantic.dev/) for grammar validation and
[Jinja2](https://jinja.palletsprojects.com/) for template rendering.

## Core Concepts

Expand All @@ -16,6 +19,115 @@ Agents can cooperate to complete sequences of tasks through so-called [taskflows

You can find a detailed overview of the taskflow grammar [here](doc/GRAMMAR.md) and example taskflows [here](examples/taskflows/).

## Architecture

```
┌─────────────────────────────────────────────────────┐
│ CLI (cli.py) │
│ Typer-based entry point: -p, -t, -l, -g, --resume │
└─────────────────────┬───────────────────────────────┘
┌─────────────────────▼───────────────────────────────┐
│ Runner (runner.py) │
│ Taskflow execution loop, model resolution, │
│ template rendering, session checkpointing │
└─────────────────────┬───────────────────────────────┘
┌─────────────────────▼───────────────────────────────┐
│ MCP Lifecycle (mcp_lifecycle.py) │
│ Server connection, cleanup, process management │
└─────────────────────┬───────────────────────────────┘
┌─────────────────────▼───────────────────────────────┐
│ Agent (agent.py) │
│ TaskAgent wrapper, hooks, OpenAI Agents SDK bridge │
└─────────────────────────────────────────────────────┘

Supporting modules:
models.py — Pydantic v2 grammar models (validation)
session.py — Task-level checkpoint / resume
available_tools.py — YAML resource loader with caching
template_utils.py — Jinja2 template environment
mcp_utils.py — MCP client parameter resolution
mcp_transport.py — MCP transport implementations (stdio, streamable)
mcp_prompt.py — System prompt construction
prompt_parser.py — Legacy prompt argument parser
capi.py — AI API endpoint and token management
path_utils.py — Platform-aware data/log directories
```

### API Types

The agent supports both the **Chat Completions** and **Responses** OpenAI APIs.
The API type can be configured globally or per model in a `model_config` file:

```yaml
seclab-taskflow-agent:
version: "1.0"
filetype: model_config
api_type: chat_completions # default for all models
models:
gpt_default: gpt-4.1
gpt_responses: gpt-5.1
model_settings:
gpt_responses:
api_type: responses # override for this model
endpoint: https://api.githubcopilot.com
token: CAPI_TOKEN # env var name containing the API key
```

Per-model `model_settings` can include:
- **`api_type`** — `"chat_completions"` (default) or `"responses"`
- **`endpoint`** — API base URL override for this model
- **`token`** — name of an environment variable containing the API key

### Session Recovery

Taskflow runs are automatically checkpointed at the task level. If a task
fails after exhausting retries, the session is saved and can be resumed:

```
** 🤖💾 Session saved: abc123def456
** 🤖💡 Resume with: --resume abc123def456
```

Resume from the last successful checkpoint:

```bash
python -m seclab_taskflow_agent --resume abc123def456
```

Failed tasks are automatically retried up to 3 times with increasing backoff
before the session is saved. Session checkpoints are stored in the
platform-specific application data directory.

### Error Output

By default, errors are shown as concise one-line messages. Use `--debug` (or
set `TASK_AGENT_DEBUG=1`) for full tracebacks:

```bash
# Concise (default)
Error: [BadRequestError] model 'foo' not found
(use --debug for full traceback)

# Full traceback
python -m seclab_taskflow_agent --debug -t examples.taskflows.echo
```

### MCP Environment Denylist

By default, MCP server subprocesses inherit the parent environment. To prevent
specific variables from leaking to MCP servers, set `TASKFLOW_ENV_DENYLIST` to
a comma-separated list of variable names:

```bash
export TASKFLOW_ENV_DENYLIST="MY_SECRET_TOKEN,PRIVATE_KEY,OTHER_CREDENTIAL"
```

Toolbox-level `env:` declarations in YAML still inject exactly what each server
needs, so explicitly configured variables are unaffected.

## Use Cases and Examples

The Seclab Taskflow Agent framework was primarily designed to fit the iterative feedback loop driven work involved in Agentic security research workflows and vulnerability triage tasks.
Expand Down
37 changes: 36 additions & 1 deletion doc/GRAMMAR.md
Original file line number Diff line number Diff line change
Expand Up @@ -509,4 +509,39 @@ When `gpt_latest` is used in the taskflow to specify a model, the value `gpt-5`

```

This provides a easy way to update model versions in a taskflow.
This provides an easy way to update model versions in a taskflow.

#### Per-model settings

A `model_config` file can include per-model settings via `model_settings` and a
global `api_type` that applies to all models unless overridden:

```yaml
seclab-taskflow-agent:
version: "1.0"
filetype: model_config
api_type: chat_completions # default for all models
models:
gpt_default: gpt-4.1
gpt_responses: gpt-5.1
model_settings:
gpt_default:
temperature: 0.7
gpt_responses:
api_type: responses # use the Responses API for this model
endpoint: https://api.githubcopilot.com
token: CAPI_TOKEN # env var name containing the API key
temperature: 0.5
```

The following keys in `model_settings` are handled by the engine and are not
passed to the underlying model provider:

| Key | Description | Default |
|-----|-------------|---------|
| `api_type` | `"chat_completions"` or `"responses"` | Inherited from top-level `api_type`, or `"chat_completions"` |
| `endpoint` | API base URL for this model | The global `AI_API_ENDPOINT` env var |
| `token` | Name of an environment variable containing the API key | Uses `AI_API_TOKEN` / `COPILOT_TOKEN` |

All other keys (e.g. `temperature`, `top_p`) are passed through as model
parameters to the OpenAI SDK.
17 changes: 17 additions & 0 deletions examples/model_configs/responses_api.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# SPDX-FileCopyrightText: GitHub, Inc.
# SPDX-License-Identifier: MIT

# Example: per-model API type and endpoint configuration.
# gpt_responses uses the Responses API on the CAPI endpoint,
# reading its token from the CAPI_TOKEN env var.

seclab-taskflow-agent:
version: "1.0"
filetype: model_config
models:
gpt_responses: gpt-5.1
model_settings:
gpt_responses:
api_type: responses
endpoint: https://api.githubcopilot.com
token: CAPI_TOKEN
125 changes: 125 additions & 0 deletions examples/taskflows/comprehensive_test.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
# SPDX-FileCopyrightText: GitHub, Inc.
# SPDX-License-Identifier: MIT

# Comprehensive test taskflow that exercises every grammar feature:
# - model_config reference with model aliases
# - globals (with CLI override via -g)
# - inputs (task-level template variables)
# - env (task-scoped environment variables)
# - must_complete
# - exclude_from_context
# - max_steps
# - MCP toolboxes (echo)
# - shell task (run)
# - repeat_prompt + async iteration
# - reusable tasks (uses)
# - reusable prompts ({% include %})
# - agent handoffs (multi-agent)
# - headless mode
# - blocked_tools

seclab-taskflow-agent:
version: "1.0"
filetype: taskflow

model_config: examples.model_configs.model_config

globals:
topic: fruit
detail_level: brief

taskflow:
# ---------------------------------------------------------------
# Task 1: Shell task — produces a JSON array for repeat_prompt
# Features: run, must_complete
# ---------------------------------------------------------------
- task:
name: generate-items
must_complete: true
run: |
echo '[{"name": "apple", "color": "red"}, {"name": "banana", "color": "yellow"}, {"name": "orange", "color": "orange"}]'

# ---------------------------------------------------------------
# Task 2: Repeat prompt over shell output, async iteration
# Features: repeat_prompt, async, async_limit, exclude_from_context,
# model (alias), inputs, globals, env, max_steps
# ---------------------------------------------------------------
- task:
name: describe-items
repeat_prompt: true
async: true
async_limit: 3
exclude_from_context: true
must_complete: true
model: gpt_default
max_steps: 10
agents:
- examples.personalities.fruit_expert
inputs:
format: one-sentence
env:
FRUIT_MODE: "analysis"
user_prompt: |
The topic is {{ globals.topic }} at {{ globals.detail_level }} detail level.
Describe the {{ result.name }} (which is {{ result.color }}) in {{ inputs.format }} format.

# ---------------------------------------------------------------
# Task 3: MCP tool call with echo server
# Features: toolboxes, headless, blocked_tools
# ---------------------------------------------------------------
- task:
name: echo-test
must_complete: true
headless: true
agents:
- examples.personalities.echo
user_prompt: |
Echo the following message: "All {{ globals.topic }} items processed successfully"
blocked_tools:
- nonexistent_tool_to_test_filtering

# ---------------------------------------------------------------
# Task 4: Reusable task via `uses`
# Features: uses (inherits from single_step_taskflow)
# ---------------------------------------------------------------
- task:
name: reusable-task
uses: examples.taskflows.single_step_taskflow
model: gpt_default

# ---------------------------------------------------------------
# Task 5: Reusable prompt via {% include %}
# Features: Jinja2 include directive, reusable prompts
# ---------------------------------------------------------------
- task:
name: include-prompt
agents:
- examples.personalities.fruit_expert
model: gpt_default
max_steps: 5
user_prompt: |
Tell me about apples.

{% include 'examples.prompts.example_prompt' %}

Keep your answer to two sentences per fruit.

# ---------------------------------------------------------------
# Task 6: Agent handoffs (multi-agent)
# Features: multiple agents (first=primary, rest=handoff targets)
# ---------------------------------------------------------------
- task:
name: handoff-test
model: gpt_default
max_steps: 15
agents:
- examples.personalities.fruit_expert
- examples.personalities.apple_expert
- examples.personalities.banana_expert
- examples.personalities.orange_expert
user_prompt: |
You are a fruit coordinator. I need specific expert advice on each fruit.
Please hand off to the apple expert for a one-sentence fact about apples,
then to the banana expert for a one-sentence fact about bananas,
then to the orange expert for a one-sentence fact about oranges.
Each expert should provide exactly one interesting fact.
20 changes: 20 additions & 0 deletions examples/taskflows/echo_responses_api.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# SPDX-FileCopyrightText: GitHub, Inc.
# SPDX-License-Identifier: MIT

# Echo taskflow using the Responses API with MCP tool calls.

seclab-taskflow-agent:
version: "1.0"
filetype: taskflow

model_config: examples.model_configs.responses_api

taskflow:
- task:
max_steps: 5
must_complete: true
agents:
- examples.personalities.echo
model: gpt_responses
user_prompt: |
Hello from the Responses API
Loading
Loading