This document codifies the best practices and standard phases for implementing new LangSmith or LangGraph API features as CLI commands in Langstar. These patterns have been established through successful implementations including:
- #402 ls-prompt-structured-outputs - Structured output prompts (with scout phase)
- #298 ls-runs-query - Runs query and filtering
- #334 ls-annotation-queues - Annotation queue management
- #201 devcontainer-feature - Infrastructure milestone (different pattern)
Each API → CLI feature follows a 12-phase process (plus optional scouting phase 0.0):
| Phase | Name | Goal | Deliverable |
|---|---|---|---|
| 0.0 | Pre-Epic Scouting (Optional) | Gather research and technical context | Scout research report |
| 0 | Epic Setup | Establish tracking structure | Parent issue, milestone, sub-issues |
| 1 | Research | Understand SDK precedent | Research report in reference/research/ |
| 2 | Design | Ensure DX consistency and integration | Design decisions documented in research report |
| 3 | OpenAPI Validation | Verify design against spec | Validation report + extracted schemas |
| 4 | SDK Types | Implement Rust types | sdk/src/{feature}.rs types |
| 5 | SDK Client | Implement client methods | Client methods in SDK |
| 6 | CLI Commands | Implement CLI commands | cli/src/commands/{feature}.rs |
| 7 | Test Planning | Generate comprehensive test plan | Test plan document via /gh-milestones:test-plan |
| 8 | Testing | Ensure quality | Unit tests (mocked) + integration tests |
| 9 | Test Audit | Verify test compliance | Audit report via /gh-milestones:test-audit |
| 10 | Documentation | Document usage | README updates, implementation docs |
| 11 | Milestone Release | Mark milestone as shipped | Closed milestone linked to GitHub release |
Note: Phase 0.0 (Pre-Epic Scouting) and Phase 11 (Milestone Release) are recent additions based on lessons learned from milestone #7 (ls-prompt-structured-outputs). See Issue #448 for detailed analysis. Phase 7 (Test Planning) and Phase 9 (Test Audit) were added to formalize comprehensive test planning and compliance verification (Issue #634).
For new API features where you need preliminary research and technical context, create a scout issue to gather knowledge before authoring the milestone's Phase 0 parent issue.
Use scout issues when:
- ✅ Adding support for a new LangSmith/LangGraph API feature
- ✅ Need to understand API patterns and SDK precedents before writing tickets
- ✅ Want to explore the solution space through experimentation
- ✅ Gathering technical context to inform milestone structure and ticket authoring
Skip scout issues when:
- ❌ Fixing a bug in existing functionality (scope is already clear)
- ❌ Small enhancements to existing commands (patterns already established)
- ❌ Infrastructure changes (devcontainer, CI/CD)
- ❌ Documentation-only changes
Create an exploratory research issue using this pattern:
Title Format: [Scout] Research {feature-name} API patterns and technical context
Required Sections:
- Purpose: Gather research and knowledge for milestone planning
- Scope: What to research (NOT implementation)
- Deliverables: Research report, SDK notes, optional experiments
- Success Criteria: Technical insights gathered, milestone structure recommended
Example: Issue #398 - Scout for structured output prompts
Focus on research and knowledge gathering, do not implement. Activities include:
- Search existing langstar code in
./cliand./sdkfor related implementations - Analyze Python SDK precedent using
setup-remote-repo-notes-dirskill - Identify relevant API endpoints and request/response shapes
- Run experiments to explore API behavior and validate assumptions
- Document technical patterns, conventions, and integration points
- Provide insights for authoring the milestone's first ticket(s)
-
Research Report at
docs/research/{issue-num}-{slug}-scout.md:- Existing langstar implementation analysis
- API endpoint identification
- SDK precedent analysis (Python SDK)
- Technical patterns and conventions discovered
- Experimentation findings (if applicable)
- Insights for milestone planning
- Recommended structure for Phase 0 parent issue
- Suggested initial sub-issues
- Open questions for implementation
-
Updated Reference Notes:
reference/repo/langchain-ai/langsmith-sdk/notes/README.md- Document key SDK patterns and method signatures
-
Optional: Experiment Scripts:
reference/experiments/{issue-num}-{slug}/- Python scripts to explore API behavior
- Validate assumptions through hands-on testing
Key: Scout issues exist before the milestone is created.
Workflow:
- Create scout issue (no milestone yet)
- Complete scout research → PR directly to main
- Review findings and technical insights
- Use research to create milestone and author Phase 0 parent issue
- Optional: Retroactively attach scout issue to milestone for historical tracking
Knowledge Foundation:
- Gather technical context before authoring milestone tickets
- Document API patterns and SDK precedents
- Create reusable research artifacts
- Understand the problem domain through experimentation
Better Milestone Planning:
- Parent issue scope is informed by actual research, not assumptions
- Sub-issue breakdown reflects discovered patterns
- Initial tickets target the right technical approach
- Open questions are identified upfront
Reduced Uncertainty:
- Experimentation validates assumptions early
- API behavior is understood before implementation
- Technical integration points are documented
- Implementation challenges are anticipated
Create a milestone-level issue following the naming convention:
Format: {api-name} milestone - {description}
Example: ls-runs-query milestone - Be able to list and filter runs using langstar CLI
Required sections:
- Overview/TL;DR
- Goals / Success Criteria
- User Stories (epic-level and concrete)
- API Endpoints & Examples
- Design & Implementation Plan (high-level)
- Testing strategy
- Documentation plan
Create a matching milestone linking to the parent issue:
# Via GitHub UI or API
gh api repos/:owner/:repo/milestones -f title="ls-runs-query" \
-f description="Parent issue: #298"Use the gh-sub-issue skill (.claude/skills/gh-sub-issue/SKILL.md) to create the standard phase sub-issues:
# Create all phase sub-issues
gh sub-issue create --parent 298 --title "298.1-research Research langsmith-sdk runs query precedent"
gh sub-issue create --parent 298 --title "298.2-design Design DX consistency and configuration integration"
gh sub-issue create --parent 298 --title "298.3-openapi-validation Validate runs query design against LangSmith OpenAPI spec"
gh sub-issue create --parent 298 --title "298.4-sdk-runs-types Implement Run types and QueryRunsRequest in SDK"
gh sub-issue create --parent 298 --title "298.5-sdk-runs-client Implement query_runs client method with pagination"
gh sub-issue create --parent 298 --title "298.6-cli-runs-command Implement langstar runs query CLI command"
gh sub-issue create --parent 298 --title "298.7-test-plan Generate comprehensive test plan for runs query"
gh sub-issue create --parent 298 --title "298.8-runs-testing Add comprehensive tests for runs query"
gh sub-issue create --parent 298 --title "298.9-runs-test-audit Audit test compliance for runs query"
gh sub-issue create --parent 298 --title "298.10-runs-docs Documentation for runs query feature"
# Verify hierarchy
gh sub-issue list 298 --relation childrenCRITICAL: Every issue (epic AND all sub-issues) MUST have the milestone attached for accurate progress tracking.
Use the setup-remote-repo-notes-dir skill (.claude/skills/setup-remote-repo-notes-dir/SKILL.md) to clone the relevant SDK:
# For LangSmith features
.claude/skills/setup-remote-repo-notes-dir/scripts/setup_repo_notes.sh https://github.com/langchain-ai/langsmith-sdk
# For LangGraph features
.claude/skills/setup-remote-repo-notes-dir/scripts/setup_repo_notes.sh https://github.com/langchain-ai/langgraphResult structure:
reference/repo/langchain-ai/langsmith-sdk/
├── notes/ # Your research notes (committed)
└── code/ # Cloned SDK (gitignored)
Key files to examine:
code/python/langsmith/client.py- Main client implementationcode/python/langsmith/schemas.py- Data modelscode/python/langsmith/_internal/- Internal utilitiescode/python/tests/- Test patterns and examples
Questions to answer:
- What is the method signature?
- What parameters are supported?
- How is pagination handled?
- What are the request/response shapes?
- How are errors handled?
- What conveniences does the SDK provide?
Create report at reference/research/{issue-num}-{slug}-precedent.md:
Required sections:
- Executive Summary
- Method Signature Analysis
- Parameter Documentation
- Request/Response Shapes
- Pagination Strategy
- Error Handling
- Recommendations for Rust Implementation
Example: reference/research/298-ls-runs-query-precedent.md
Before diving into implementation, analyze how the new feature will integrate with Langstar's existing architecture and user experience. This phase ensures consistency across the CLI and surfaces design decisions early.
Evaluate how the feature aligns with existing Langstar commands and patterns:
Questions to answer:
- Which existing commands have similar functionality? (e.g.,
runs queryvsdeployments list) - What flag naming conventions are already established? (e.g.,
-p/--project,-o/--output) - What output formats are supported and how should this feature use them?
- How do similar commands handle pagination, filtering, and sorting?
- What error messages and exit codes are used for similar error conditions?
Review existing patterns in:
cli/src/commands/- Command structure and argumentscli/src/config.rs- Configuration loading patterns- Existing command help text (
langstar <command> --help)
Document in research report:
- Consistency decisions (which patterns to follow)
- Intentional deviations (with rationale)
- New patterns being introduced (if any)
Analyze how the feature integrates with Langstar's configuration system:
Questions to answer:
- Which environment variables does this feature need? (existing vs new)
- Does it need workspace/organization scoping like other features?
- What's the precedence order? (CLI flags > env vars > config file > defaults)
- Are there sensible defaults that match the UI behavior?
Review configuration precedents in:
cli/src/config.rs- Existing configuration patterns- Environment variable documentation in README
- How similar features handle missing configuration
Configuration checklist:
- Uses existing env vars where applicable (
LANGSMITH_API_KEY,LANGSMITH_WORKSPACE_ID) - New env vars follow naming convention (
LANGSMITH_*orLANGGRAPH_*) - Defaults match reasonable expectations
- Error messages guide users to configure missing values
Understand what this feature accomplishes from a user's perspective in the LangSmith/LangGraph UI:
Questions to answer:
- What workflow does this feature support in the UI?
- What business problem does it solve for users?
- How do users currently accomplish this task? (UI clicks, existing CLI, API calls)
- What would be the ideal CLI experience for this workflow?
Research methods:
- Explore the feature in LangSmith/LangGraph UI
- Review official documentation for the feature
- Consider common user scenarios and edge cases
Document in research report:
- UI workflow description (what users do in the web interface)
- Key user scenarios (the "jobs to be done")
- How the CLI can improve or complement the UI workflow
Add a "Design Decisions" section to your research report:
## Design Decisions
### DX Consistency
- Following `runs query` pattern for [reason]
- Using `-f/--filter` flag consistent with [existing command]
- Output formats: json (default for piping), table (default for terminal)
### Configuration
- Requires: LANGSMITH_API_KEY (existing), LANGSMITH_PROJECT_NAME (existing)
- New env var: [none / LANGSMITH_NEW_VAR for reason]
- Defaults: [list sensible defaults]
### Business Purpose
- Supports workflow: [describe UI workflow]
- Key scenarios: [list 2-3 primary use cases]
- CLI advantage: [why CLI is better than UI for this]Langstar uses a canonical source + derived fragments pattern for managing OpenAPI specs, inspired by the setup-remote-repo-notes-dir skill:
reference/
├── openapi/langchain/ # Canonical full specs (source of truth)
│ ├── langsmith/
│ │ ├── openapi.json # Full spec (635K)
│ │ └── MANIFEST.md # Provenance metadata
│ └── control-plane/
│ ├── openapi.json # Full spec (70K)
│ └── MANIFEST.md
│
└── api-specs/ # Extracted fragments + documentation
├── README.md # Index and usage guide
├── LANGSMITH_API_OVERVIEW.md # Quick reference (4 APIs)
├── LANGSMITH_APIS_DETAILS.md # Detailed catalog
├── langsmith/
│ ├── FRAGMENTS.md # jq extraction queries (reproducible)
│ └── *.json # Extracted fragments
└── control-plane/
└── FRAGMENTS.md
Benefits:
- Separation: Canonical specs vs AI-friendly fragments
- Reproducibility: jq queries documented in
FRAGMENTS.md - Provenance:
MANIFEST.mdtracks when/how specs were fetched - AI-friendly: Small fragments fit context windows for grounding
# LangSmith API - fetch to canonical location
curl -o reference/openapi/langchain/langsmith/openapi.json \
https://api.smith.langchain.com/openapi.json
# LangGraph Cloud API (Control Plane)
curl -o reference/openapi/langchain/control-plane/openapi.json \
https://api.host.langchain.com/openapi.json
# Update MANIFEST.md with provenance
echo "| $(date +%Y-%m-%d) | Refresh | $(du -h reference/openapi/langchain/langsmith/openapi.json | cut -f1) | Updated from remote |" \
>> reference/openapi/langchain/langsmith/MANIFEST.mdExtract fragments to reference/api-specs/langsmith/ and document in FRAGMENTS.md:
# Navigate to canonical spec
cd reference/openapi/langchain/langsmith
# Extract endpoint definition
jq '.paths["/api/v1/runs/query"]' openapi.json \
> ../../api-specs/langsmith/runs-query-endpoint.json
# Extract request schema
jq '.components.schemas.BodyParamsForRunsQuerySchema' openapi.json \
> ../../api-specs/langsmith/runs-query-request-schema.json
# Extract response schema
jq '.components.schemas.ListRunsResponse' openapi.json \
> ../../api-specs/langsmith/runs-query-response-schema.json
# Extract entity schema (e.g., Run)
jq '.components.schemas.Run' openapi.json \
> ../../api-specs/langsmith/run-schema.jsonIMPORTANT: After extracting, update reference/api-specs/langsmith/FRAGMENTS.md:
| File | Size | Purpose | jq Query | Last Updated |
|------|------|---------|----------|--------------|
| `runs-query-endpoint.json` | 1.0K | POST /runs/query endpoint | `.paths["/api/v1/runs/query"]` | YYYY-MM-DD |Create validation report at reference/research/{issue-num}-openapi-validation.md:
Required validations:
- HTTP method matches
- Path matches (with version prefix)
- Request body schema matches research
- Response schema matches research
- Field types are correctly identified
- Required vs optional fields
Example jq queries for validation (run from reference/openapi/langchain/langsmith/):
# Check endpoint method
jq '.paths["/api/v1/annotation-queues/{queue_id}/runs"].post' openapi.json
# Check request body schema
jq '.paths["/api/v1/annotation-queues/{queue_id}/runs"].post.requestBody.content["application/json"].schema' \
openapi.json
# List all paths for a feature
jq '.paths | keys | map(select(contains("annotation-queue")))' openapi.jsonAny differences between research and OpenAPI spec MUST be documented:
- Corrections to research findings
- Discoveries not in research
- Confirmations of research
File: sdk/src/{feature}.rs
Pattern from existing code (see sdk/src/runs.rs, sdk/src/deployments.rs):
use chrono::{DateTime, Utc};
use serde::{Deserialize, Serialize};
use uuid::Uuid;
/// Enum types (match OpenAPI enum values exactly)
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
#[serde(rename_all = "snake_case")]
pub enum RunType {
Tool,
Chain,
Llm,
// ...
}
/// Main entity struct (based on OpenAPI schema)
#[derive(Debug, Clone, Deserialize)]
pub struct Run {
// Required fields (non-optional)
pub id: Uuid,
pub name: String,
// Optional fields
pub description: Option<String>,
pub start_time: Option<DateTime<Utc>>,
}
/// Request struct for queries
#[derive(Debug, Clone, Default, Serialize)]
pub struct QueryRunsRequest {
#[serde(skip_serializing_if = "Option::is_none")]
pub project_name: Option<String>,
// ...
}
/// Paginated response
#[derive(Debug, Clone, Deserialize)]
pub struct ListRunsResponse {
pub runs: Vec<Run>,
pub cursors: Option<Cursors>,
}// sdk/src/lib.rs
pub mod runs;
pub use runs::{Run, RunType, QueryRunsRequest, ListRunsResponse};Pattern: Methods in sdk/src/client.rs or feature-specific modules
impl LangchainClient {
/// Query runs with filtering and pagination
pub async fn query_runs(&self, request: QueryRunsRequest) -> Result<ListRunsResponse, Error> {
let url = format!("{}/api/v1/runs/query", self.langsmith_base_url);
let response = self.http_client
.post(&url)
.header("X-Api-Key", &self.api_key)
.json(&request)
.send()
.await?;
// Handle response...
}
}Follow the cursor-based pagination pattern:
/// Stream all runs with automatic pagination
pub fn query_runs_stream(&self, request: QueryRunsRequest) -> impl Stream<Item = Result<Run, Error>> {
// Implementation using cursors.next
}File: cli/src/commands/{feature}.rs
Pattern from existing commands (see cli/src/commands/runs.rs):
use clap::{Args, Subcommand};
#[derive(Debug, Subcommand)]
pub enum RunsCommand {
/// Query runs with filtering
Query(QueryArgs),
}
#[derive(Debug, Args)]
pub struct QueryArgs {
/// Project name or ID to query runs from
#[arg(short, long)]
pub project: Option<String>,
/// Filter expression
#[arg(short, long)]
pub filter: Option<String>,
/// Output format (json, table)
#[arg(short = 'o', long, default_value = "table")]
pub format: String,
}// cli/src/commands/mod.rs
pub mod runs;
// cli/src/main.rs
#[derive(Debug, Subcommand)]
enum Commands {
/// Manage runs
#[command(subcommand)]
Runs(runs::RunsCommand),
}Use the established config pattern from cli/src/config.rs:
- Environment variables take precedence
- Support both
LANGSMITH_API_KEYandLANGGRAPH_API_KEY - Support organization/workspace scoping
Before implementing tests, generate a comprehensive test plan that ensures complete coverage of the feature's functionality, error conditions, and edge cases. This phase uses the /gh-milestones:test-plan command to automate test plan generation.
Before generating the test plan, review all deliverables from previous phases:
Required review:
- Research reports from Phase 1
- Design decisions from Phase 2
- OpenAPI validation from Phase 3
- SDK types implementation (Phase 4)
- SDK client methods (Phase 5)
- CLI commands implementation (Phase 6)
- All merged PRs and their discussions
Why this matters:
- Test plans must cover all features documented in prior phases
- Design decisions inform test scenarios
- OpenAPI validation identifies edge cases
- Implementation details reveal error conditions to test
Use the test planning command to generate a comprehensive test plan:
/gh-milestones:test-plan <milestone-name-or-number>Examples:
# Using milestone name
/gh-milestones:test-plan ls-runs-query
# Using milestone number
/gh-milestones:test-plan 8
# Using milestone URL
/gh-milestones:test-plan https://github.com/codekiln/langstar/milestone/8What the command does:
- Loads relevant testing documentation (progressive disclosure)
- Reviews all issues and PRs in the milestone
- Analyzes implementation from merged PRs
- Generates comprehensive test plan document
- Identifies gaps in test coverage
The generated test plan should be added to the testing phase issue and should include:
Required sections:
- Feature Overview: Summary of what's being tested
- Test Scope: What's in scope and out of scope
- SDK Unit Tests: Mocked tests for SDK methods
- SDK Integration Tests: Real API tests with CRUD lifecycle
- CLI Integration Tests: End-to-end CLI command tests
- Error Conditions: All error scenarios to test
- Edge Cases: Boundary conditions and unusual inputs
- Pre-commit Validation: Checklist before implementation
Example structure:
# Test Plan: Runs Query Feature (Milestone ls-runs-query)
## Feature Overview
[Summary of runs query functionality]
## Test Scope
**In scope:**
- SDK query_runs method with all parameters
- CLI runs query command
- Pagination handling
- Error responses
**Out of scope:**
- [Features explicitly not covered]
## SDK Unit Tests (Mocked)
### 8.1.1 test_query_runs_success
- Mock POST /api/v1/runs/query
- Verify request structure
- Verify response parsing
[Additional test cases...]
## SDK Integration Tests (Real API)
### 8.2.1 test_query_runs_crud_lifecycle
- Create test project
- Create test runs
- Query runs with filters
- Verify results
- Clean up resources
[Additional test cases...]
## CLI Integration Tests
### 8.3.1 test_cli_runs_query_basic
- Run: `langstar runs query --project test-project`
- Verify output format
- Verify exit code
[Additional test cases...]
## Error Conditions
- Invalid API key
- Malformed filter expression
- Non-existent project
[Additional scenarios...]
## Edge Cases
- Empty result set
- Very large result set
- Special characters in filters
[Additional scenarios...]
## Pre-commit Validation
- [ ] All unit tests pass
- [ ] All integration tests pass
- [ ] cargo fmt --check passes
- [ ] cargo clippy passesAfter generating the test plan:
- Post test plan to the testing phase issue (e.g.,
298.8-runs-testing) - Link test plan in issue description
- Use test plan as implementation guide in Phase 8
Example issue update:
## Test Plan
See generated test plan below:
[Generated test plan content]
## Implementation Checklist
- [ ] SDK unit tests implemented
- [ ] SDK integration tests implemented
- [ ] CLI integration tests implemented
- [ ] All error conditions covered
- [ ] All edge cases covered
- [ ] Pre-commit checks passingBefore test implementation:
- Comprehensive test coverage plan before writing code
- Identifies missing test scenarios early
- Ensures alignment between tests and requirements
- Provides clear success criteria for Phase 8
Quality assurance:
- Test plans are reviewed before implementation begins
- Gaps in test coverage are identified before code is written
- Testing phase has clear deliverables and acceptance criteria
Efficiency:
- Automated test plan generation saves 1-2 hours of manual planning
- Progressive disclosure loads only relevant testing docs (~4K tokens vs ~24K for all docs)
- Test plan serves as implementation checklist
Location: In-module tests or sdk/tests/
Use httpmock for API mocking:
#[cfg(test)]
mod tests {
use httpmock::prelude::*;
#[tokio::test]
async fn test_query_runs_success() {
let server = MockServer::start();
let mock = server.mock(|when, then| {
when.method(POST)
.path("/api/v1/runs/query");
then.status(200)
.json_body(json!({
"runs": [],
"cursors": null
}));
});
// Test client against mock server
}
}Location: sdk/tests/{feature}_test.rs or cli/tests/{feature}_command_test.rs
Requirements:
- Mark with
#[cfg_attr(not(feature = "integration-tests"), ignore)] - Use
LANGSMITH_API_KEYandLANGSMITH_WORKSPACE_IDenv vars - Clean up any created resources
- Document prerequisites in test file header
Pattern (from cli/tests/README.md):
/// Integration test for runs query
///
/// Prerequisites:
/// - LANGSMITH_API_KEY set
/// - LANGSMITH_WORKSPACE_ID set
///
/// Run with: cargo test --features integration-tests
#[cfg_attr(not(feature = "integration-tests"), ignore)]
#[tokio::test]
async fn test_query_runs_integration() {
// ...
}ALWAYS run before committing:
cargo fmt && \
cargo check --workspace --all-features && \
cargo clippy --workspace --all-features -- -D warnings && \
cargo test --workspace --all-features && \
cargo fmt --checkAfter implementing tests, verify that the implementation complies with both the test plan (Phase 7) and the project's testing guidelines. This phase catches common issues that slip through even well-intentioned test implementations.
Experience has shown that test implementations often deviate from test plans in problematic ways (see Issue #637 post-mortem):
Common problems caught by audit:
- Integration tests marked
#[ignore]instead of properly conditional - CI not configured with required environment variables
- Anemic tests that only verify exit codes, not actual behavior
- Missing CRUD lifecycle verification (SDK → CLI → SDK)
- Tests that don't clean up resources
- Missing error condition coverage
Use the /gh-milestones:test-audit command to verify test compliance:
/gh-milestones:test-audit <milestone-name-or-number>Examples:
# Using milestone name
/gh-milestones:test-audit ls-runs-query
# Using milestone number
/gh-milestones:test-audit 8What the command does:
- Loads the test plan from Phase 7
- Loads project testing guidelines (HIGH_LEVEL_TESTING_GUIDELINES.md)
- Analyzes implemented tests against the plan
- Checks for common anti-patterns
- Verifies CI configuration includes required environment variables
- Generates compliance report with specific remediation steps
The audit verifies compliance with these requirements:
Test Structure:
- Unit tests use
#[cfg(test)]module pattern - Integration tests use proper feature flag:
#[cfg_attr(not(feature = "integration-tests"), ignore)] - Tests are NOT unconditionally ignored with
#[ignore] - Test files follow naming conventions (
*_test.rsor*_command_test.rs)
Test Quality (Toyota Andon Cord):
- Tests verify actual behavior, not just exit codes
- SDK operations are verified through round-trip assertions
- CLI tests verify output content, not just success/failure
- Error conditions are tested with specific error type verification
- Edge cases from test plan are covered
CRUD Lifecycle Pattern:
- Integration tests create resources via SDK
- Tests operate on resources via CLI or SDK under test
- Tests verify results using SDK (not just CLI output)
- Tests clean up created resources (even on failure)
CI Configuration:
- Required environment variables listed in CI workflow
- Integration test job has access to
LANGSMITH_API_KEY - Integration test job has access to
LANGSMITH_WORKSPACE_ID - Feature flag
integration-testsis enabled in CI
The audit produces a compliance report with:
Report Structure:
# Test Audit Report: [Milestone Name]
## Summary
- Tests Planned: [count from test plan]
- Tests Implemented: [count found]
- Compliance Rate: [percentage]
- Critical Issues: [count]
- Warnings: [count]
## Critical Issues (Must Fix)
### Issue 1: [Description]
- Location: [file:line]
- Problem: [specific issue]
- Remediation: [exact fix needed]
## Warnings (Should Fix)
### Warning 1: [Description]
...
## Test Plan Coverage Matrix
| Test Case (from plan) | Implemented? | File:Line | Notes |
|----------------------|--------------|-----------|-------|
| test_create_run | ✅ Yes | sdk/tests/runs_test.rs:45 | |
| test_query_runs_empty| ❌ No | - | Missing |
## CI Configuration Status
- [ ] Environment variables configured
- [ ] Feature flags enabled
- [ ] Job dependencies correct
## Recommendations
1. [Specific action item]
2. [Specific action item]If the audit finds issues:
- Critical issues must be fixed before merge
- Warnings should be addressed unless explicitly justified
- Re-run audit after fixes:
/gh-milestones:test-audit <milestone> - Update test plan if new test cases were discovered
Quality Assurance:
- Catches gaps between plan and implementation
- Enforces Toyota Andon Cord principle
- Prevents "tests that don't test anything" anti-pattern
Process Improvement:
- Creates feedback loop to improve test planning
- Documents common issues for future reference
- Builds institutional knowledge about testing patterns
CI Reliability:
- Ensures tests actually run in CI (not skipped)
- Verifies environment configuration
- Prevents "works locally, fails in CI" surprises
Create docs/implementation/{issue-num}-{slug}-implementation-plan.md:
- Executive summary
- Research sources with links
- Implementation phases with code snippets
- Testing plan
- Future enhancements
Add new commands to main README:
- Command syntax
- Example usage
- Environment variables
- Rustdoc comments on all public items
- Examples in doc comments where helpful
- Link to research reports for complex decisions
When the milestone's features ship in a GitHub release, use the /gh-milestones:release slash command to automate milestone cleanup.
Before running milestone release:
- All milestone PRs merged to main
- GitHub release created and published
- All sub-issues closed (or explicitly force release with
FORCE_RELEASE=true) - CI/CD passing on main branch
- CHANGELOG.md updated (if manual versioning)
/gh-milestones:release <milestone> <version>Examples:
# Using milestone name
/gh-milestones:release ls-prompt-structured-outputs v0.10.0
# Using milestone URL
/gh-milestones:release https://github.com/codekiln/langstar/milestone/7 v0.10.0The /gh-milestones:release command performs the following actions:
- Validates Release Exists: Confirms GitHub release is published
- Checks Sub-Issue Completion: Warns if any sub-issues are still open (requires
gh-sub-issueextension) - Updates Milestone Description: Prepends release link to milestone description
- Closes Milestone: Marks milestone as closed
- Adds Release Comment: Comments on parent issue with release link
- Closes Parent Issue: Marks parent issue as closed
Example Output:
✅ **Milestone Release Tracking Complete**
📍 Milestone: ls-prompt-structured-outputs (#7)
🔗 Parent Issue: #402 - Add structured output prompt support
📦 Release: v0.10.0
🔗 Release URL: https://github.com/codekiln/langstar/releases/tag/v0.10.0
**Actions Completed:**
✅ Verified release v0.10.0 exists
✅ Validated sub-issue completion
✅ Milestone marked as closed
✅ Milestone description updated with release information
✅ Parent issue #402 closed with release comment
If sub-issues are intentionally still open, force the release:
FORCE_RELEASE=true /gh-milestones:release <milestone> <version>Note: Not recommended. Best practice is to close all sub-issues before releasing.
Typical release workflow:
# 1. Merge final PR for milestone
gh pr merge 385 --squash
# 2. Create GitHub release (or automated via CI)
gh release create v0.10.0 --generate-notes
# 3. Mark milestone as released
/gh-milestones:release "ls-prompt-structured-outputs" v0.10.0Consistency: Every milestone follows same release tracking pattern
Efficiency: Manual milestone updates take 5-10 minutes, automation completes in <10 seconds
Traceability: Clear link from milestone → release → parent issue
Validation: Enforces sub-issue completion, validates release exists
- PR #442:
/gh-milestones:releasecommand implementation - Command Documentation:
.claude/commands/gh-milestones:release.md - Example: Milestone #7 (ls-prompt-structured-outputs) released in v0.10.0
- Uses same env var names as existing features (
LANGSMITH_API_KEY, etc.) - Follows same precedence: env var > config file > defaults
- Supports organization/workspace scoping consistently
- Uses same output format options (
json,table)
- Follows existing module structure
- Uses same error handling patterns
- Uses same serde patterns (rename_all, skip_serializing_if)
- Follows same test organization
- Unit tests use httpmock
- Integration tests use feature flag
- Tests document prerequisites
- Tests clean up resources
- Research reports in
reference/research/ - Canonical OpenAPI specs in
reference/openapi/langchain/{api}/ - MANIFEST.md updated with provenance for spec fetches
- Extracted fragments in
reference/api-specs/{api}/ - FRAGMENTS.md updated with jq queries for extractions
- Implementation plans in
docs/implementation/ - Same markdown structure
| Tool/Skill | Purpose | Location |
|---|---|---|
| gh-sub-issue | Manage issue hierarchies | .claude/skills/gh-sub-issue/SKILL.md |
| setup-remote-repo-notes-dir | Research SDK codebases | .claude/skills/setup-remote-repo-notes-dir/SKILL.md |
| jq | OpenAPI spec validation | System tool |
| httpmock | Rust HTTP mocking | Cargo dependency |
| langgraph-docs MCP | LangGraph/LangSmith docs | MCP server |
When creating sub-issues, use this template for scope:
## Scope
1. Set up research workspace with setup-remote-repo-notes-dir
2. Analyze Python SDK implementation
3. Document method signatures and parameters
4. Identify pagination patterns
5. Review tests for usage examples
## Deliverable
Research report at `reference/research/{num}-{slug}-precedent.md`## Scope
1. Fetch OpenAPI spec to reference/api-specs/
2. Extract relevant endpoint and schema definitions with jq
3. Validate research findings against spec
4. Document confirmations, corrections, discoveries
## Deliverable
Validation report at `reference/research/{num}-openapi-validation.md`The complete milestone lifecycle spans from preliminary research through GitHub release:
| Phase | Name | When | Typical Duration |
|---|---|---|---|
| 0.0 | Pre-Epic Scouting | Before milestone (optional) | 1-3 days |
| 0 | Epic Setup | Start of milestone | 1 day |
| 1-10 | Standard Development | Implementation | 1-4 weeks |
| 11 | Milestone Release | After merge + GitHub release | <1 hour (automated) |
Is this a new API feature needing research and technical context?
├── No → Skip to Phase 0 (Epic Setup)
└── Yes → Start with Phase 0.0 (Scout)
↓
Scout gathers knowledge and insights
↓
Use findings to author Phase 0 parent issue
-
Pre-Milestone (Phase 0.0): Scout issue exists, no milestone yet
- Research is exploratory
- No commitment to full implementation
- Scout PR merges directly to main
-
Milestone Created (Phase 0): Parent issue + milestone + sub-issues created
- Milestone attached to ALL issues
- Sub-issues link to parent via
gh-sub-issue - Development waves may be parallelized
-
Active Development (Phases 1-9): Sub-issues progress through standard phases
- PRs typically merge directly to main (not hierarchical)
- Milestone description updated with progress
- Sub-issues closed as PRs merge
-
Released (Phase 11): Milestone closed, linked to GitHub release
/gh-milestones:releaseautomates cleanup- Parent issue closed with release comment
- Milestone description shows release link
- Audit trail: issue → milestone → release
Pattern: Use short, hyphenated names for milestone titles
- ✅
ls-prompt-structured-outputs(clear, grep-able) - ✅
ls-evals-basic(scoped) - ❌
Structured Output Prompts Feature(spaces, verbose)
Benefits:
- Easy to reference in commands:
/gh-milestones:release ls-evals-basic v0.10.0 - Grep-able in code and documentation
- Works well with GitHub API and CLI tools
Avoid:
- ❌ Creating milestone without parent issue
- ❌ Attaching milestone only to parent (not sub-issues)
- ❌ Manually closing milestone without release link
- ❌ Leaving parent issue open after release ships
- ❌ Skipping Phase 0.0 scout for unclear API features
- ❌ Closing milestone before all sub-issues are done
- GitHub Workflow - Issue-driven development
- Git SCM Conventions - Commit message format
- Code Style Principles - Explicit over implicit
- Procedures - Pre-commit checklist
A feature is complete when:
- All sub-issues closed
- Research and validation reports committed
- SDK types and methods implemented
- CLI commands functional
- Unit tests passing (100% of new code)
- Integration tests passing
- Documentation updated
- GitHub release published
- Milestone closed via
/gh-milestones:release(Phase 11) - Parent issue closed with release link