Skip to content

Commit 4e11632

Browse files
committed
Add high-performance Python agent instructions and lightweight task compatibility tests
- Created AGENTS.md with guidelines for Python projects managed by `uv`, including dependency management, engineering principles, and performance considerations. - Added CLAUDE.md to reference AGENTS.md. - Introduced lightweight_tasks_compatibility.feature to define scenarios for lightweight tasks and prompt-based instructions. - Implemented step definitions in template_steps.py to support the new feature scenarios, ensuring compatibility with pb-build and validating task structures.
1 parent 91bcbdb commit 4e11632

10 files changed

Lines changed: 413 additions & 26 deletions

File tree

AGENTS.md

Lines changed: 210 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,210 @@
1+
# High-Performance Python Agent Instructions
2+
3+
## Scope
4+
5+
- This template targets Python projects managed by `uv`.
6+
- `src/` contains the main application package (src layout).
7+
- `tests/` contains all tests.
8+
- No web-framework-specific assumptions.
9+
10+
## uv Project Rules (Critical)
11+
12+
1. Never manually edit dependency versions in `pyproject.toml`; use `uv add`.
13+
2. Add runtime dependencies with:
14+
15+
```bash
16+
uv add <package>
17+
```
18+
19+
3. Add development dependencies with:
20+
21+
```bash
22+
uv add --group dev <package>
23+
```
24+
25+
4. Always run code through `uv run` to ensure the correct virtual environment.
26+
5. Lock dependencies with `uv lock`; commit `uv.lock` to version control.
27+
6. Use `uv sync --all-groups` to install all dependency groups.
28+
29+
## Preferred Dependencies and Versions
30+
31+
When introducing new dependencies, prefer these unless compatibility requires a change:
32+
33+
- `orjson >= 3.10` — fast JSON serialization/deserialization
34+
- `msgspec >= 0.19` — high-performance struct-based serialization
35+
- `httpx >= 0.28` — async-first HTTP client
36+
- `uvloop >= 0.21` — drop-in asyncio event loop replacement (Linux/macOS)
37+
- `structlog >= 25.1` — structured logging
38+
- `pydantic >= 2.11` — data validation (only when full validation is needed)
39+
- `click >= 8.1` — CLI framework (for complex CLIs; `argparse` for simple ones)
40+
- `polars >= 1.30` — DataFrame operations
41+
- `anyio >= 4.9` — structured concurrency
42+
- `grpcio >= 1.71` — gRPC
43+
- `sqlalchemy >= 2.0` — database ORM (async mode preferred)
44+
- `aiosqlite >= 0.21` — async SQLite
45+
- `asyncpg >= 0.30` — async PostgreSQL
46+
- `redis >= 6.0` — async Redis client
47+
- `prometheus-client >= 0.22` — metrics
48+
- `opentelemetry-api >= 1.33` — OpenTelemetry tracing
49+
- `opentelemetry-sdk >= 1.33` — OpenTelemetry SDK
50+
- `cachetools >= 5.5` — in-process caching utilities
51+
52+
## Dependency Priority and Forbidden Choices
53+
54+
- JSON preference: `orjson` over `json` (stdlib) and `ujson`.
55+
- Serialization preference: `msgspec` over `pydantic` for pure serialization (no validation needed).
56+
- HTTP client preference: `httpx` (with `h2` for HTTP/2) over `requests` and `aiohttp`.
57+
- DataFrame preference: `polars` over `pandas` for new code.
58+
- Event loop preference: `uvloop` over default asyncio loop.
59+
- Logging preference: `structlog` over `logging` (stdlib).
60+
- Forbidden by default: `requests` (sync-only), `pandas` (use `polars`), `print()` for logging.
61+
62+
## Engineering Principles
63+
64+
### Python Implementation Guidelines
65+
66+
1. Type annotations:
67+
- All public functions and methods must have full type annotations.
68+
- Use `from __future__ import annotations` at the top of every module.
69+
- Use modern union syntax (`X | None`) instead of `Optional[X]`.
70+
- Use `ty` for type checking; code must pass `ty check` with zero errors.
71+
2. Error handling:
72+
- Define explicit exception hierarchies per module/package.
73+
- Never use bare `except:` or `except Exception:` without re-raising.
74+
- Use `contextlib.suppress()` for intentional exception swallowing.
75+
- Prefer returning typed result objects over raising exceptions in hot paths.
76+
3. Async/Await:
77+
- Default to `async` for I/O-bound code.
78+
- Use `asyncio.TaskGroup` (Python 3.11+) for structured concurrency.
79+
- Never mix `asyncio.run()` with already-running event loops.
80+
4. Observability:
81+
- Logging: `structlog` with JSON output in production.
82+
- Metrics/traces: OpenTelemetry OTLP gRPC.
83+
- Never use `print()` for logging or diagnostics in library code.
84+
5. Configuration:
85+
- Use environment variables with `pydantic-settings` or `msgspec` for config parsing.
86+
- Prefer TOML configuration files.
87+
6. Security:
88+
- Never hardcode secrets; use environment variables or secret managers.
89+
- Validate all external input at system boundaries.
90+
91+
### Key Design Principles
92+
93+
- Modularity: Design each module so it can be imported independently with clear boundaries and minimal hidden coupling.
94+
- Performance: Prefer zero-copy patterns, memory-mapped I/O when appropriate, vectorized operations, and pre-allocated buffers.
95+
- Extensibility: Use Protocols (`typing.Protocol`) and abstract base classes for pluggable implementations.
96+
- Type Safety: Maintain strong static typing across interfaces and internals; minimize use of `Any`.
97+
98+
### Performance Considerations
99+
100+
- Avoid allocations in hot loops; prefer pre-allocated lists, `array.array`, or NumPy/Polars for bulk data.
101+
- Use `__slots__` on data-heavy classes to reduce per-instance memory overhead.
102+
- Prefer `struct.pack`/`struct.unpack` or `memoryview` for binary protocol parsing.
103+
- Use generator expressions and `itertools` to avoid materializing large intermediate lists.
104+
- Profile before optimizing; use `py-spy`, `scalene`, or `cProfile` to identify real bottlenecks.
105+
106+
### Concurrency and Async Execution
107+
108+
- Use `uvloop` as the event loop policy for production servers.
109+
- Use `asyncio.TaskGroup` for structured concurrent I/O.
110+
- Use `concurrent.futures.ProcessPoolExecutor` for CPU-bound parallelism.
111+
- Use `asyncio.to_thread()` to offload blocking calls from the event loop.
112+
- Prefer `asyncio.Queue` for async producer-consumer patterns.
113+
- Never perform blocking I/O (file reads, DNS, HTTP) directly in async coroutines.
114+
- Use `anyio` when portability across asyncio/trio is required.
115+
- Limit concurrent connections with `asyncio.Semaphore` to prevent resource exhaustion.
116+
- Channel selection:
117+
- Async-to-Async: `asyncio.Queue` / `anyio.create_memory_object_stream`
118+
- CPU parallelism: `multiprocessing` or `concurrent.futures.ProcessPoolExecutor`
119+
- Avoid threading for CPU-bound work due to the GIL (use `multiprocessing` or native extensions)
120+
121+
### Memory and Allocation
122+
123+
- Use `__slots__` on frequently instantiated classes; can reduce memory by 40–60%.
124+
- Use `msgspec.Struct` over `dataclasses` / `pydantic.BaseModel` for high-throughput data objects.
125+
- Prefer `bytes` / `bytearray` / `memoryview` over `str` for binary data; avoid repeated encode/decode.
126+
- Use `orjson` for JSON serialization — it returns `bytes` directly, avoiding intermediate string allocation.
127+
- For large datasets, use memory-mapped files (`mmap`) or Arrow-backed DataFrames (`polars`).
128+
- Use `sys.getsizeof()` and `tracemalloc` to profile memory usage.
129+
- Prefer `array.array` over `list` for homogeneous numeric data.
130+
- Use weak references (`weakref`) for caches that should not prevent garbage collection.
131+
132+
### Type and Data Layout
133+
134+
- Use `msgspec.Struct` with `frozen=True` for immutable data transfer objects.
135+
- Use `dataclasses` with `slots=True, frozen=True` for simple value types.
136+
- Use `enum.IntEnum` over `enum.Enum` for performance-critical flag/state types.
137+
- Prefer `typing.NamedTuple` over plain tuples for self-documenting return types.
138+
- Keep error types lightweight; avoid attaching large payloads to exception instances.
139+
- Use `typing.TypeAlias` for complex type expressions to improve readability.
140+
141+
### C Extension and Native Interop
142+
143+
- Use `cffi` or `ctypes` for calling C libraries; prefer `cffi` for new code.
144+
- Use `pyo3` / `maturin` for writing performance-critical modules in Rust.
145+
- Use `Cython` only when interfacing with existing C/C++ codebases.
146+
- Keep the Python ↔ native boundary coarse-grained; minimize per-call overhead.
147+
- Always release the GIL (`nogil` in Cython, `py.allow_threads` in PyO3) for long-running native computations.
148+
149+
### Tooling and Quality
150+
151+
- Lint with `ruff` — all code must pass `ruff check` with zero errors.
152+
- Format with `ruff format` — consistent style, no debates.
153+
- Type check with `ty` — all code must pass `ty check` with zero errors.
154+
- Use Gherkin + `behave` for outer-loop acceptance tests.
155+
- Use `pytest` for inner-loop TDD — tests must pass before claiming completion.
156+
- Use `pytest-benchmark` for performance-sensitive code.
157+
- Use `pytest-cov` to track test coverage.
158+
159+
### Common Pitfalls
160+
161+
- Do not block the event loop with synchronous I/O.
162+
- Do not use mutable default arguments (`def f(x=[]):`).
163+
- Do not catch broad exceptions without re-raising.
164+
- Do not use `global` or module-level mutable state in library code.
165+
- Do not import inside functions unless lazy loading is intentional and documented.
166+
- Handle `KeyboardInterrupt` and `SystemExit` separately from `Exception`.
167+
168+
### What to Avoid
169+
170+
- Incomplete implementations: finish features before submitting.
171+
- Large, sweeping changes: keep changes focused and reviewable.
172+
- Mixing unrelated changes: keep one logical change per commit.
173+
- Using `# type: ignore` without a specific error code and justification comment.
174+
175+
## Development Workflow
176+
177+
When fixing failures, identify root cause first, then apply idiomatic fixes instead of suppressing warnings or patching symptoms.
178+
179+
Use outside-in development for behavior changes:
180+
181+
- start with a failing Gherkin scenario under `features/`,
182+
- drive implementation with failing `pytest` tests,
183+
- keep step definitions thin and reuse Python domain modules.
184+
185+
After each feature or bug fix, run:
186+
187+
```bash
188+
just format
189+
just lint
190+
just test
191+
just bdd
192+
just test-all
193+
```
194+
195+
If any command fails, report the failure and do not claim completion.
196+
197+
## Testing Requirements
198+
199+
- BDD scenarios: place Gherkin features under `features/` and step definitions under `features/steps/`.
200+
- Use BDD to define acceptance behavior first, then use `pytest` for the inner TDD loop.
201+
- Unit tests: place in `tests/` mirroring the source structure.
202+
- Integration tests: place in `tests/integration/`.
203+
- Performance tests: use `pytest-benchmark` with `@pytest.mark.benchmark`.
204+
- Add tests for behavioral changes and public API changes.
205+
- Use `pytest` fixtures for setup/teardown; avoid `setUp`/`tearDown` methods.
206+
- Use `pytest.raises` for exception testing; `pytest.approx` for floating-point comparisons.
207+
208+
## Language Requirement
209+
210+
- Documentation, comments, and commit messages must be English only.

CLAUDE.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
@AGENTS.md

Justfile

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,15 @@ check: format lint type-check
3333
test:
3434
uv run pytest
3535

36+
# Run BDD acceptance tests
37+
bdd:
38+
uv run behave
39+
40+
# Run both pytest and behave suites
41+
test-all:
42+
uv run pytest
43+
uv run behave
44+
3645
# Run all checks and tests
3746
all: format lint type-check test
3847

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
Feature: Lightweight pb-plan tasks stay compatible with pb-build
2+
3+
Scenario: Lightweight task examples use pb-build compatible identifiers and status markers
4+
Given the pb-plan templates are loaded
5+
When I inspect the lightweight tasks instructions
6+
Then the lightweight tasks use Task X.Y headings
7+
And each lightweight task includes a status marker
8+
9+
Scenario: Prompt-based pb-build instructions are self-contained
10+
Given the pb-build prompt template is loaded
11+
When I inspect the prompt-only build instructions
12+
Then the embedded implementer prompt is present
13+
And the prompt does not require references/implementer_prompt.md

features/steps/template_steps.py

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
from __future__ import annotations
2+
3+
from collections.abc import Callable
4+
from typing import Any, cast
5+
6+
from behave import given, then, when
7+
8+
from pb_spec.templates import load_prompt, load_skill_content
9+
10+
type BehaveContext = Any
11+
type StepFunction = Callable[..., object]
12+
type StepDecoratorFactory = Callable[[str], Callable[[StepFunction], StepFunction]]
13+
14+
given_step = cast(StepDecoratorFactory, given)
15+
when_step = cast(StepDecoratorFactory, when)
16+
then_step = cast(StepDecoratorFactory, then)
17+
18+
19+
@given_step("the pb-plan templates are loaded")
20+
def step_given_pb_plan_templates(context: BehaveContext) -> None:
21+
context.pb_plan_templates = [load_skill_content("pb-plan"), load_prompt("pb-plan")]
22+
23+
24+
@when_step("I inspect the lightweight tasks instructions")
25+
def step_when_inspect_lightweight_tasks(context: BehaveContext) -> None:
26+
context.lightweight_sections = [
27+
content.split("## Step 5b:", maxsplit=1)[0] for content in context.pb_plan_templates
28+
]
29+
30+
31+
@then_step("the lightweight tasks use Task X.Y headings")
32+
def step_then_lightweight_tasks_use_task_ids(context: BehaveContext) -> None:
33+
for section in context.lightweight_sections:
34+
assert "### Task 1.1:" in section
35+
assert "### Task 1:" not in section
36+
37+
38+
@then_step("each lightweight task includes a status marker")
39+
def step_then_lightweight_tasks_include_status(context: BehaveContext) -> None:
40+
for section in context.lightweight_sections:
41+
assert "- **Status:** 🔴 TODO" in section
42+
43+
44+
@given_step("the pb-build prompt template is loaded")
45+
def step_given_pb_build_prompt_template(context: BehaveContext) -> None:
46+
context.pb_build_prompt = load_prompt("pb-build")
47+
48+
49+
@when_step("I inspect the prompt-only build instructions")
50+
def step_when_inspect_prompt_build_instructions(context: BehaveContext) -> None:
51+
context.pb_build_prompt_text = context.pb_build_prompt
52+
53+
54+
@then_step("the embedded implementer prompt is present")
55+
def step_then_embedded_implementer_prompt_present(context: BehaveContext) -> None:
56+
assert "## IMPLEMENTER PROMPT TEMPLATE" in context.pb_build_prompt_text
57+
assert "Task {{TASK_NUMBER}}: {{TASK_NAME}}" in context.pb_build_prompt_text
58+
59+
60+
@then_step("the prompt does not require references/implementer_prompt.md")
61+
def step_then_prompt_does_not_require_reference_file(context: BehaveContext) -> None:
62+
assert "Read `references/implementer_prompt.md`" not in context.pb_build_prompt_text

pyproject.toml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "uv_build"
44

55
[project]
66
name = "pb-spec"
7-
version = "0.7.0"
7+
version = "0.7.1"
88
description = "Plan-Build Spec (pb-spec): A CLI tool for managing AI coding assistant skills"
99
readme = "README.md"
1010
license = "Apache-2.0"
@@ -40,9 +40,10 @@ testpaths = ["tests"]
4040

4141
[dependency-groups]
4242
dev = [
43+
"behave>=1.3.3",
4344
"pytest>=9.0.2",
4445
"ruff>=0.15.5",
45-
"ty>=0.0.20",
46+
"ty>=0.0.21",
4647
]
4748

4849
[tool.ruff]

src/pb_spec/templates/prompts/pb-plan.prompt.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -166,7 +166,7 @@ Remove all instructional placeholder text (such as bracket examples) in the fina
166166

167167
## Step 5a: Output tasks.md — Lightweight Mode (< 50 words)
168168

169-
Write a **flat task list** to `specs/<spec-dir>/tasks.md`:
169+
Write a **flat task list** to `specs/<spec-dir>/tasks.md`. Even in lightweight mode, task IDs must remain in `Task X.Y` format so `pb-build` can track state reliably:
170170

171171
```markdown
172172
# [Feature Name] — Tasks
@@ -178,17 +178,19 @@ Write a **flat task list** to `specs/<spec-dir>/tasks.md`:
178178

179179
## Tasks
180180

181-
### Task 1: [Task Name]
181+
### Task 1.1: [Task Name]
182182

183183
> **Context:** ...
184184
> **Verification:** ...
185185
> **Scenario Coverage:** [Feature/scenario names]
186186
187187
- **Loop Type:** `BDD+TDD` / `TDD-only`
188+
- **Status:** 🔴 TODO
188189
- [ ] Step 1: ...
189190
- [ ] Step 2: ...
190191
- [ ] BDD Verification: ...
191192
- [ ] Verification: ...
193+
- [ ] Runtime Verification (if applicable): [Logs + probe result, or `N/A` with reason]
192194
```
193195

194196
For lightweight tasks that introduce or change runtime behavior (service startup, UI runtime flow, API availability, performance-critical paths), include runtime observability checks in `Verification`:

src/pb_spec/templates/skills/pb-plan/SKILL.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -183,7 +183,7 @@ Remove all instructional placeholder text (such as bracket examples) in the fina
183183

184184
### Step 5a: Output `tasks.md` — Lightweight Mode (< 50 words)
185185

186-
Write a **flat task list** to `specs/<spec-dir>/tasks.md`. No phases — just ordered tasks:
186+
Write a **flat task list** to `specs/<spec-dir>/tasks.md`. No phases — just ordered tasks. Even in lightweight mode, task IDs must remain in `Task X.Y` format so `pb-build` can track state reliably:
187187

188188
```markdown
189189
# [Feature Name] — Tasks
@@ -195,17 +195,19 @@ Write a **flat task list** to `specs/<spec-dir>/tasks.md`. No phases — just or
195195

196196
## Tasks
197197

198-
### Task 1: [Task Name]
198+
### Task 1.1: [Task Name]
199199

200200
> **Context:** ...
201201
> **Verification:** ...
202202
> **Scenario Coverage:** [Feature/scenario names]
203203
204204
- **Loop Type:** `BDD+TDD` / `TDD-only`
205+
- **Status:** 🔴 TODO
205206
- [ ] Step 1: ...
206207
- [ ] Step 2: ...
207208
- [ ] BDD Verification: ...
208209
- [ ] Verification: ...
210+
- [ ] Runtime Verification (if applicable): [Logs + probe result, or `N/A` with reason]
209211
```
210212

211213
For lightweight tasks that introduce or change runtime behavior (service startup, UI runtime flow, API availability, performance-critical paths), include runtime observability checks in `Verification`:

0 commit comments

Comments
 (0)