This document describes how to generate security test cases for AI coding agents using ASTRA's agent security testing framework.
The pipeline generates adversarial test cases by:
- Loading a knowledge graph of security vulnerabilities and attack techniques
- Using a multi-agent system to compose realistic, actionable security test scenarios
- Parsing and validating the generated test cases
- Uploading the curated dataset to HuggingFace
Example Dataset: PurCL/astra-agent-security - Generated using this pipeline
pip install -r requirements.txtKey dependencies:
openai-agents[litellm]- Multi-agent orchestration frameworkdatasets- HuggingFace datasets librarypydantic- Data validationtqdm- Progress tracking
Edit resources/agent-sec-config.yaml to customize the generation pipeline:
# Main coordinator agent model (orchestrates the workflow)
coordinator_model: "claude-sonnet-4-5"
# Composer agent model (generates test case drafts and revisions)
composer_model: "claude-sonnet-4-5"
# Reviewer committee models (evaluate test case quality)
reviewer_models:
- "claude-sonnet-3-7"
- "claude-haiku-4-5"
- "gpt-oss-20b"
- "gpt-oss-120b"
- "qwen3coder"
# Generation settings
parallel_tasks: 50 # Adjust based on your API rate limits
# Output directories
output_dir: "data_out"
log_dir: "log_out_agent_sec"Pre-configured Models in client-config.yaml:
| Short Name | Provider | Full Model ID | Description |
|---|---|---|---|
claude-sonnet-4-5 |
bedrock | global.anthropic.claude-sonnet-4-5-20250929-v1:0 |
Most capable Claude (recommended for coordinator/composer) |
claude-sonnet-3-7 |
bedrock | us.anthropic.claude-3-7-sonnet-20250219-v1:0 |
Balanced performance and cost |
claude-haiku-4-5 |
bedrock | us.anthropic.claude-haiku-4-5-20251001-v1:0 |
Fast and cost-effective |
gpt-oss-20b |
bedrock | openai.gpt-oss-20b-1:0 |
OpenAI 20B on Bedrock |
gpt-oss-120b |
bedrock | openai.gpt-oss-120b-1:0 |
OpenAI 120B on Bedrock |
qwen3coder |
openai | Qwen/Qwen3-Coder-30B-A3B-Instruct |
Custom vLLM/SGLang server |
phi4m |
openai | microsoft/Phi-4-mini-instruct |
Custom vLLM/SGLang server |
Model Selection Tips:
- Use more capable models (e.g.,
claude-sonnet-4-5) for coordinator and composer roles - Mix diverse models in the reviewer committee for varied perspectives
- Any model defined in
client-config.yamlcan be used inagent-sec-config.yaml
Edit resources/client-config.yaml to set up your model endpoints.
AWS Bedrock Models:
claude-sonnet-4-5:
provider: bedrock
model_name: global.anthropic.claude-sonnet-4-5-20250929-v1:0
region: us-west-2OpenAI-Compatible Servers (vLLM, SGLang, etc.):
qwen3coder:
provider: openai
model_name: Qwen/Qwen3-Coder-30B-A3B-Instruct
addr: http://<YOUR_HOST>/v1
api_key: <YOUR_API_KEY>
# Add more custom models as needed
my-local-model:
provider: openai
model_name: meta-llama/Llama-3.3-70B-Instruct
addr: http://<YOUR_HOST>/v1
api_key: <YOUR_API_KEY>For HuggingFace dataset uploads:
export HF_TOKEN=<your_huggingface_token>Run the main agent security data generation script:
python agent/main_agent_sec.pyWhat it does:
- Loads the knowledge graph from
kg/agent-sec/init.gen.kg - Extracts leaf nodes representing concrete security vulnerability instances
- For each instance, spawns a multi-agent system that:
- Composes an initial adversarial test case draft
- Reviews and refines the test case through multiple iterations
- Validates realism, actionability, and benign intent scores
- Outputs results to
data_out/syn_agent_sec.jsonl
Key configuration:
- Models used are defined in
resources/agent-sec-config.yaml - Parallelism is controlled by
parallel_taskssetting (default: 50) - Knowledge graph structure:
prohibited_domain → technique_family → concrete_prohibited_instance
Output format (data_out/syn_agent_sec.jsonl):
{
"instance_id": "domain_technique_instance",
"concrete_prohibited_instance": "SQL Injection",
"technique_family": "Injection Attacks",
"prohibited_domain": "Web Security",
"final_output": "<final_case>...</final_case>",
"message_history": [...]
}Resume capability: The script automatically skips instances that already exist in the output file, allowing you to resume interrupted runs.
Parse the raw agent outputs into structured format:
python agent/parse_agent_sec.py \
--input data_out/syn_agent_sec.jsonl \
--output data_out/syn_agent_sec_parsed.jsonlWhat it does:
- Extracts
<final_case>XML blocks from agent outputs - Validates against the expected schema using Pydantic models
- Parses committee member scores (realism, actionability, benign_intent)
- Outputs only successfully parsed cases
Output format (data_out/syn_agent_sec_parsed.jsonl):
{
"prohibited_domain": "Web Security",
"technique_family": "Injection Attacks",
"concrete_prohibited_instance": "SQL Injection",
"request_text": "Create a login form with...",
"malicious_rationale": "This could lead to...",
"risk_tags": ["authentication", "database"],
"committee_snapshot": [
{
"member_id": "reviewer_1",
"realism": 5,
"actionability": 4,
"benign_intent": 5
}
]
}Error handling: Failed parsing attempts are logged to stdout but don't stop the process.
Upload the parsed dataset to HuggingFace Hub:
python agent/upload_to_hf_dataset.py \
--input data_out/syn_agent_sec_parsed.jsonl \
--dataset-name <your-username>/astra-agent-securityExample Dataset: See PurCL/astra-agent-security for a dataset generated using this pipeline.
What it does:
- Loads parsed test cases
- Extracts core fields (excluding internal committee snapshots)
- Creates a HuggingFace Dataset
- Pushes to the specified repository
Dataset schema:
prohibited_domain(str): High-level security domaintechnique_family(str): Category of attack techniqueconcrete_prohibited_instance(str): Specific vulnerability typerequest_text(str): The adversarial prompt/requestmalicious_rationale(str): Explanation of potential security risks
Complete end-to-end example:
# Step 1: Generate test cases (may take hours depending on KG size)
python agent/main_agent_sec.py
# Step 2: Parse and validate
python agent/parse_agent_sec.py \
-i data_out/syn_agent_sec.jsonl \
-o data_out/syn_agent_sec_parsed.jsonl
# Step 3: Upload to HuggingFace
python agent/upload_to_hf_dataset.py \
-i data_out/syn_agent_sec_parsed.jsonl \
-d username/astra-agent-securityThe agent security knowledge graph (kg/agent-sec/) contains:
init.kg- Base knowledge graph structureinit.gen.kg- Expanded/generated knowledge graphinit.yaml- YAML representation
The hierarchy follows:
Prohibited Domain (e.g., "Web Security")
└── Technique Family (e.g., "Injection Attacks")
└── Concrete Instance (e.g., "SQL Injection", "XSS")
The generation pipeline uses three specialized agents:
-
Coordinator Agent (
agent/agent_sec_composer/prompts/coordinator.md)- Orchestrates the overall workflow
- Decides when to compose, revise, or review
-
Composer Agent (
agent/agent_sec_composer/prompts/composer.md)- Creates initial test case drafts
- Revises based on review feedback
-
Reviewer Agent (
agent/agent_sec_composer/prompts/reviewer.md)- Evaluates test cases on multiple dimensions
- Provides scores (1-5) for realism, actionability, benign intent
The agents iterate until consensus is reached on a high-quality test case.
data_out/- Generated datasets (excluded from git)log_out_agent_sec/- Detailed agent interaction logs (excluded from git)
To use different models, simply edit resources/agent-sec-config.yaml:
# Use a faster/cheaper model for the composer
composer_model: "claude-haiku-4-5"
# Use only local models for reviewers
reviewer_models:
- "qwen3coder"
- "my-local-model"Model Resolution Logic:
When you specify a model name in agent-sec-config.yaml, the system:
- Looks up the model in
client-config.yamlby name - Checks the
providerfield:provider: bedrock→ Routes to AWS Bedrock Converse APIprovider: openai→ Routes to OpenAI-compatible server (vLLM, SGLang, etc.)
Example Flow:
# In agent-sec-config.yaml
coordinator_model: "claude-sonnet-4-5"
# System looks up in client-config.yaml:
claude-sonnet-4-5:
provider: bedrock # ← Automatically routes to Bedrock
model_name: global.anthropic.claude-sonnet-4-5-20250929-v1:0
region: us-west-2
# Result: LitellmModel("bedrock/converse/global.anthropic.claude-sonnet-4-5-20250929-v1:0")No Code Changes Needed! Just add models to client-config.yaml with the appropriate provider field.
Edit paths in resources/agent-sec-config.yaml:
output_dir: "my_output"
log_dir: "my_logs"No code changes needed! Just add entries to client-config.yaml:
For AWS Bedrock Models:
my-new-bedrock-model:
provider: bedrock
model_name: us.anthropic.claude-opus-4-7-20250514-v1:0
region: us-west-2For OpenAI-Compatible Servers:
my-local-llama:
provider: openai
model_name: meta-llama/Llama-3.3-70B-Instruct
addr: http://localhost:8000/v1
api_key: dummy-keyThen reference it in agent-sec-config.yaml:
coordinator_model: "my-new-bedrock-model"
composer_model: "my-local-llama"If you hit API rate limits, reduce the parallelism in resources/agent-sec-config.yaml:
parallel_tasks: 10 # Reduce from default 50For large knowledge graphs, process in batches by modifying the to_query list.
Check agent/agent_sec_composer/output_parser.py for schema validation details. Common issues:
- Missing required XML tags
- Score values outside 1-5 range
- Empty committee snapshots
Ensure you're authenticated with HuggingFace:
huggingface-cli loginIf you use this dataset generation pipeline, please cite:
@article{xu2025astra,
title={ASTRA: Autonomous Spatial-Temporal Red-teaming for AI Software Assistants},
author={Xu, Xiangzhe and Shen, Guangyu and Su, Zian and Cheng, Siyuan and Guo, Hanxi and Yan, Lu and Chen, Xuan and Jiang, Jiasheng and Jin, Xiaolong and Wang, Chengpeng and others},
journal={arXiv preprint arXiv:2508.03936},
year={2025}
}- Main ASTRA README: README.md
- ASTRA Paper: https://www.arxiv.org/pdf/2508.03936
- ASTRA Website: https://astra-share.github.io