Standard Feature Development Process

This document codifies the best practices and standard phases for implementing new LangSmith or LangGraph API features as CLI commands in Langstar. These patterns have been established through successful implementations including:

#402 ls-prompt-structured-outputs - Structured output prompts (with scout phase)
#298 ls-runs-query - Runs query and filtering
#334 ls-annotation-queues - Annotation queue management
#201 devcontainer-feature - Infrastructure milestone (different pattern)

Overview

Each API → CLI feature follows a 12-phase process (plus optional scouting phase 0.0):

Phase	Name	Goal	Deliverable
0.0	Pre-Epic Scouting (Optional)	Gather research and technical context	Scout research report
0	Epic Setup	Establish tracking structure	Parent issue, milestone, sub-issues
1	Research	Understand SDK precedent	Research report in `reference/research/`
2	Design	Ensure DX consistency and integration	Design decisions documented in research report
3	OpenAPI Validation	Verify design against spec	Validation report + extracted schemas
4	SDK Types	Implement Rust types	`sdk/src/{feature}.rs` types
5	SDK Client	Implement client methods	Client methods in SDK
6	CLI Commands	Implement CLI commands	`cli/src/commands/{feature}.rs`
7	Test Planning	Generate comprehensive test plan	Test plan document via `/gh-milestones:test-plan`
8	Testing	Ensure quality	Unit tests (mocked) + integration tests
9	Test Audit	Verify test compliance	Audit report via `/gh-milestones:test-audit`
10	Documentation	Document usage	README updates, implementation docs
11	Milestone Release	Mark milestone as shipped	Closed milestone linked to GitHub release

Note: Phase 0.0 (Pre-Epic Scouting) and Phase 11 (Milestone Release) are recent additions based on lessons learned from milestone #7 (ls-prompt-structured-outputs). See Issue #448 for detailed analysis. Phase 7 (Test Planning) and Phase 9 (Test Audit) were added to formalize comprehensive test planning and compliance verification (Issue #634).

Phase 0.0: Pre-Epic Scouting (Optional)

When to Use Pre-Epic Scouting

For new API features where you need preliminary research and technical context, create a scout issue to gather knowledge before authoring the milestone's Phase 0 parent issue.

Use scout issues when:

✅ Adding support for a new LangSmith/LangGraph API feature
✅ Need to understand API patterns and SDK precedents before writing tickets
✅ Want to explore the solution space through experimentation
✅ Gathering technical context to inform milestone structure and ticket authoring

Skip scout issues when:

❌ Fixing a bug in existing functionality (scope is already clear)
❌ Small enhancements to existing commands (patterns already established)
❌ Infrastructure changes (devcontainer, CI/CD)
❌ Documentation-only changes

Scout Issue Template

Create an exploratory research issue using this pattern:

Title Format: [Scout] Research {feature-name} API patterns and technical context

Required Sections:

Purpose: Gather research and knowledge for milestone planning
Scope: What to research (NOT implementation)
Deliverables: Research report, SDK notes, optional experiments
Success Criteria: Technical insights gathered, milestone structure recommended

Example: Issue #398 - Scout for structured output prompts

Scout Issue Scope

Focus on research and knowledge gathering, do not implement. Activities include:

Search existing langstar code in ./cli and ./sdk for related implementations
Analyze Python SDK precedent using setup-remote-repo-notes-dir skill
Identify relevant API endpoints and request/response shapes
Run experiments to explore API behavior and validate assumptions
Document technical patterns, conventions, and integration points
Provide insights for authoring the milestone's first ticket(s)

Scout Issue Deliverables

Research Report at docs/research/{issue-num}-{slug}-scout.md:
- Existing langstar implementation analysis
- API endpoint identification
- SDK precedent analysis (Python SDK)
- Technical patterns and conventions discovered
- Experimentation findings (if applicable)
- Insights for milestone planning
- Recommended structure for Phase 0 parent issue
- Suggested initial sub-issues
- Open questions for implementation
Updated Reference Notes:
- reference/repo/langchain-ai/langsmith-sdk/notes/README.md
- Document key SDK patterns and method signatures
Optional: Experiment Scripts:
- reference/experiments/{issue-num}-{slug}/
- Python scripts to explore API behavior
- Validate assumptions through hands-on testing

Relationship to Milestone

Key: Scout issues exist before the milestone is created.

Workflow:

Create scout issue (no milestone yet)
Complete scout research → PR directly to main
Review findings and technical insights
Use research to create milestone and author Phase 0 parent issue
Optional: Retroactively attach scout issue to milestone for historical tracking

Benefits of Pre-Epic Scouting

Knowledge Foundation:

Gather technical context before authoring milestone tickets
Document API patterns and SDK precedents
Create reusable research artifacts
Understand the problem domain through experimentation

Better Milestone Planning:

Parent issue scope is informed by actual research, not assumptions
Sub-issue breakdown reflects discovered patterns
Initial tickets target the right technical approach
Open questions are identified upfront

Reduced Uncertainty:

Experimentation validates assumptions early
API behavior is understood before implementation
Technical integration points are documented
Implementation challenges are anticipated

Phase 0: Epic Setup

0.1 Create Parent Issue (Epic)

Create a milestone-level issue following the naming convention:

Format: {api-name} milestone - {description}

Example: ls-runs-query milestone - Be able to list and filter runs using langstar CLI

Required sections:

Overview/TL;DR
Goals / Success Criteria
User Stories (epic-level and concrete)
API Endpoints & Examples
Design & Implementation Plan (high-level)
Testing strategy
Documentation plan

0.2 Create GitHub Milestone

Create a matching milestone linking to the parent issue:

# Via GitHub UI or API
gh api repos/:owner/:repo/milestones -f title="ls-runs-query" \
  -f description="Parent issue: #298"

0.3 Create Sub-Issues Using gh-sub-issue Skill

Use the gh-sub-issue skill (.claude/skills/gh-sub-issue/SKILL.md) to create the standard phase sub-issues:

# Create all phase sub-issues
gh sub-issue create --parent 298 --title "298.1-research Research langsmith-sdk runs query precedent"
gh sub-issue create --parent 298 --title "298.2-design Design DX consistency and configuration integration"
gh sub-issue create --parent 298 --title "298.3-openapi-validation Validate runs query design against LangSmith OpenAPI spec"
gh sub-issue create --parent 298 --title "298.4-sdk-runs-types Implement Run types and QueryRunsRequest in SDK"
gh sub-issue create --parent 298 --title "298.5-sdk-runs-client Implement query_runs client method with pagination"
gh sub-issue create --parent 298 --title "298.6-cli-runs-command Implement langstar runs query CLI command"
gh sub-issue create --parent 298 --title "298.7-test-plan Generate comprehensive test plan for runs query"
gh sub-issue create --parent 298 --title "298.8-runs-testing Add comprehensive tests for runs query"
gh sub-issue create --parent 298 --title "298.9-runs-test-audit Audit test compliance for runs query"
gh sub-issue create --parent 298 --title "298.10-runs-docs Documentation for runs query feature"

# Verify hierarchy
gh sub-issue list 298 --relation children

0.4 Attach Milestone to ALL Issues

CRITICAL: Every issue (epic AND all sub-issues) MUST have the milestone attached for accurate progress tracking.

Phase 1: Research SDK Precedent

1.1 Set Up Research Workspace

Use the setup-remote-repo-notes-dir skill (.claude/skills/setup-remote-repo-notes-dir/SKILL.md) to clone the relevant SDK:

# For LangSmith features
.claude/skills/setup-remote-repo-notes-dir/scripts/setup_repo_notes.sh https://github.com/langchain-ai/langsmith-sdk

# For LangGraph features
.claude/skills/setup-remote-repo-notes-dir/scripts/setup_repo_notes.sh https://github.com/langchain-ai/langgraph

Result structure:

reference/repo/langchain-ai/langsmith-sdk/
├── notes/          # Your research notes (committed)
└── code/           # Cloned SDK (gitignored)

1.2 Analyze Python SDK

Key files to examine:

code/python/langsmith/client.py - Main client implementation
code/python/langsmith/schemas.py - Data models
code/python/langsmith/_internal/ - Internal utilities
code/python/tests/ - Test patterns and examples

Questions to answer:

What is the method signature?
What parameters are supported?
How is pagination handled?
What are the request/response shapes?
How are errors handled?
What conveniences does the SDK provide?

1.3 Write Research Report

Create report at reference/research/{issue-num}-{slug}-precedent.md:

Required sections:

Executive Summary
Method Signature Analysis
Parameter Documentation
Request/Response Shapes
Pagination Strategy
Error Handling
Recommendations for Rust Implementation

Example: reference/research/298-ls-runs-query-precedent.md

Phase 2: Design

Before diving into implementation, analyze how the new feature will integrate with Langstar's existing architecture and user experience. This phase ensures consistency across the CLI and surfaces design decisions early.

2.1 DX Consistency Analysis

Evaluate how the feature aligns with existing Langstar commands and patterns:

Questions to answer:

Which existing commands have similar functionality? (e.g., runs query vs deployments list)
What flag naming conventions are already established? (e.g., -p/--project, -o/--output)
What output formats are supported and how should this feature use them?
How do similar commands handle pagination, filtering, and sorting?
What error messages and exit codes are used for similar error conditions?

Review existing patterns in:

cli/src/commands/ - Command structure and arguments
cli/src/config.rs - Configuration loading patterns
Existing command help text (langstar <command> --help)

Document in research report:

Consistency decisions (which patterns to follow)
Intentional deviations (with rationale)
New patterns being introduced (if any)

2.2 Configuration Integration

Analyze how the feature integrates with Langstar's configuration system:

Questions to answer:

Which environment variables does this feature need? (existing vs new)
Does it need workspace/organization scoping like other features?
What's the precedence order? (CLI flags > env vars > config file > defaults)
Are there sensible defaults that match the UI behavior?

Review configuration precedents in:

cli/src/config.rs - Existing configuration patterns
Environment variable documentation in README
How similar features handle missing configuration

Configuration checklist:

Uses existing env vars where applicable (LANGSMITH_API_KEY, LANGSMITH_WORKSPACE_ID)
New env vars follow naming convention (LANGSMITH_* or LANGGRAPH_*)
Defaults match reasonable expectations
Error messages guide users to configure missing values

2.3 Business Purpose Research

Understand what this feature accomplishes from a user's perspective in the LangSmith/LangGraph UI:

Questions to answer:

What workflow does this feature support in the UI?
What business problem does it solve for users?
How do users currently accomplish this task? (UI clicks, existing CLI, API calls)
What would be the ideal CLI experience for this workflow?

Research methods:

Explore the feature in LangSmith/LangGraph UI
Review official documentation for the feature
Consider common user scenarios and edge cases

Document in research report:

UI workflow description (what users do in the web interface)
Key user scenarios (the "jobs to be done")
How the CLI can improve or complement the UI workflow

2.4 Design Decisions Summary

Add a "Design Decisions" section to your research report:

## Design Decisions

### DX Consistency
- Following `runs query` pattern for [reason]
- Using `-f/--filter` flag consistent with [existing command]
- Output formats: json (default for piping), table (default for terminal)

### Configuration
- Requires: LANGSMITH_API_KEY (existing), LANGSMITH_PROJECT_NAME (existing)
- New env var: [none / LANGSMITH_NEW_VAR for reason]
- Defaults: [list sensible defaults]

### Business Purpose
- Supports workflow: [describe UI workflow]
- Key scenarios: [list 2-3 primary use cases]
- CLI advantage: [why CLI is better than UI for this]

Phase 3: OpenAPI Validation

3.1 OpenAPI Spec Management Pattern

Langstar uses a canonical source + derived fragments pattern for managing OpenAPI specs, inspired by the setup-remote-repo-notes-dir skill:

reference/
├── openapi/langchain/              # Canonical full specs (source of truth)
│   ├── langsmith/
│   │   ├── openapi.json            # Full spec (635K)
│   │   └── MANIFEST.md             # Provenance metadata
│   └── control-plane/
│       ├── openapi.json            # Full spec (70K)
│       └── MANIFEST.md
│
└── api-specs/                      # Extracted fragments + documentation
    ├── README.md                   # Index and usage guide
    ├── LANGSMITH_API_OVERVIEW.md   # Quick reference (4 APIs)
    ├── LANGSMITH_APIS_DETAILS.md   # Detailed catalog
    ├── langsmith/
    │   ├── FRAGMENTS.md            # jq extraction queries (reproducible)
    │   └── *.json                  # Extracted fragments
    └── control-plane/
        └── FRAGMENTS.md

Benefits:

Separation: Canonical specs vs AI-friendly fragments
Reproducibility: jq queries documented in FRAGMENTS.md
Provenance: MANIFEST.md tracks when/how specs were fetched
AI-friendly: Small fragments fit context windows for grounding

3.2 Fetch or Update OpenAPI Specification

# LangSmith API - fetch to canonical location
curl -o reference/openapi/langchain/langsmith/openapi.json \
  https://api.smith.langchain.com/openapi.json

# LangGraph Cloud API (Control Plane)
curl -o reference/openapi/langchain/control-plane/openapi.json \
  https://api.host.langchain.com/openapi.json

# Update MANIFEST.md with provenance
echo "| $(date +%Y-%m-%d) | Refresh | $(du -h reference/openapi/langchain/langsmith/openapi.json | cut -f1) | Updated from remote |" \
  >> reference/openapi/langchain/langsmith/MANIFEST.md

3.3 Extract Relevant Schemas with jq

Extract fragments to reference/api-specs/langsmith/ and document in FRAGMENTS.md:

# Navigate to canonical spec
cd reference/openapi/langchain/langsmith

# Extract endpoint definition
jq '.paths["/api/v1/runs/query"]' openapi.json \
  > ../../api-specs/langsmith/runs-query-endpoint.json

# Extract request schema
jq '.components.schemas.BodyParamsForRunsQuerySchema' openapi.json \
  > ../../api-specs/langsmith/runs-query-request-schema.json

# Extract response schema
jq '.components.schemas.ListRunsResponse' openapi.json \
  > ../../api-specs/langsmith/runs-query-response-schema.json

# Extract entity schema (e.g., Run)
jq '.components.schemas.Run' openapi.json \
  > ../../api-specs/langsmith/run-schema.json

IMPORTANT: After extracting, update reference/api-specs/langsmith/FRAGMENTS.md:

| File | Size | Purpose | jq Query | Last Updated |
|------|------|---------|----------|--------------|
| `runs-query-endpoint.json` | 1.0K | POST /runs/query endpoint | `.paths["/api/v1/runs/query"]` | YYYY-MM-DD |

3.4 Validate Research Against Spec

Create validation report at reference/research/{issue-num}-openapi-validation.md:

Required validations:

HTTP method matches
Path matches (with version prefix)
Request body schema matches research
Response schema matches research
Field types are correctly identified
Required vs optional fields

Example jq queries for validation (run from reference/openapi/langchain/langsmith/):

# Check endpoint method
jq '.paths["/api/v1/annotation-queues/{queue_id}/runs"].post' openapi.json

# Check request body schema
jq '.paths["/api/v1/annotation-queues/{queue_id}/runs"].post.requestBody.content["application/json"].schema' \
  openapi.json

# List all paths for a feature
jq '.paths | keys | map(select(contains("annotation-queue")))' openapi.json

3.5 Document Discrepancies

Any differences between research and OpenAPI spec MUST be documented:

Corrections to research findings
Discoveries not in research
Confirmations of research

Phase 4: SDK Types

4.1 Create Types Module

File: sdk/src/{feature}.rs

Pattern from existing code (see sdk/src/runs.rs, sdk/src/deployments.rs):

use chrono::{DateTime, Utc};
use serde::{Deserialize, Serialize};
use uuid::Uuid;

/// Enum types (match OpenAPI enum values exactly)
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
#[serde(rename_all = "snake_case")]
pub enum RunType {
    Tool,
    Chain,
    Llm,
    // ...
}

/// Main entity struct (based on OpenAPI schema)
#[derive(Debug, Clone, Deserialize)]
pub struct Run {
    // Required fields (non-optional)
    pub id: Uuid,
    pub name: String,

    // Optional fields
    pub description: Option<String>,
    pub start_time: Option<DateTime<Utc>>,
}

/// Request struct for queries
#[derive(Debug, Clone, Default, Serialize)]
pub struct QueryRunsRequest {
    #[serde(skip_serializing_if = "Option::is_none")]
    pub project_name: Option<String>,
    // ...
}

/// Paginated response
#[derive(Debug, Clone, Deserialize)]
pub struct ListRunsResponse {
    pub runs: Vec<Run>,
    pub cursors: Option<Cursors>,
}

4.2 Register in lib.rs

// sdk/src/lib.rs
pub mod runs;
pub use runs::{Run, RunType, QueryRunsRequest, ListRunsResponse};

Phase 5: SDK Client Methods

5.1 Add Client Methods

Pattern: Methods in sdk/src/client.rs or feature-specific modules

impl LangchainClient {
    /// Query runs with filtering and pagination
    pub async fn query_runs(&self, request: QueryRunsRequest) -> Result<ListRunsResponse, Error> {
        let url = format!("{}/api/v1/runs/query", self.langsmith_base_url);

        let response = self.http_client
            .post(&url)
            .header("X-Api-Key", &self.api_key)
            .json(&request)
            .send()
            .await?;

        // Handle response...
    }
}

5.2 Handle Pagination

Follow the cursor-based pagination pattern:

/// Stream all runs with automatic pagination
pub fn query_runs_stream(&self, request: QueryRunsRequest) -> impl Stream<Item = Result<Run, Error>> {
    // Implementation using cursors.next
}

Phase 6: CLI Commands

6.1 Create Command Module

File: cli/src/commands/{feature}.rs

Pattern from existing commands (see cli/src/commands/runs.rs):

use clap::{Args, Subcommand};

#[derive(Debug, Subcommand)]
pub enum RunsCommand {
    /// Query runs with filtering
    Query(QueryArgs),
}

#[derive(Debug, Args)]
pub struct QueryArgs {
    /// Project name or ID to query runs from
    #[arg(short, long)]
    pub project: Option<String>,

    /// Filter expression
    #[arg(short, long)]
    pub filter: Option<String>,

    /// Output format (json, table)
    #[arg(short = 'o', long, default_value = "table")]
    pub format: String,
}

6.2 Register in CLI

// cli/src/commands/mod.rs
pub mod runs;

// cli/src/main.rs
#[derive(Debug, Subcommand)]
enum Commands {
    /// Manage runs
    #[command(subcommand)]
    Runs(runs::RunsCommand),
}

6.3 Follow Configuration Patterns

Use the established config pattern from cli/src/config.rs:

Environment variables take precedence
Support both LANGSMITH_API_KEY and LANGGRAPH_API_KEY
Support organization/workspace scoping

Phase 7: Test Planning

Before implementing tests, generate a comprehensive test plan that ensures complete coverage of the feature's functionality, error conditions, and edge cases. This phase uses the /gh-milestones:test-plan command to automate test plan generation.

7.1 Review Assets from Prior Phases

Before generating the test plan, review all deliverables from previous phases:

Required review:

Research reports from Phase 1
Design decisions from Phase 2
OpenAPI validation from Phase 3
SDK types implementation (Phase 4)
SDK client methods (Phase 5)
CLI commands implementation (Phase 6)
All merged PRs and their discussions

Why this matters:

Test plans must cover all features documented in prior phases
Design decisions inform test scenarios
OpenAPI validation identifies edge cases
Implementation details reveal error conditions to test

7.2 Generate Test Plan with /gh-milestones:test-plan

Use the test planning command to generate a comprehensive test plan:

/gh-milestones:test-plan <milestone-name-or-number>

Examples:

# Using milestone name
/gh-milestones:test-plan ls-runs-query

# Using milestone number
/gh-milestones:test-plan 8

# Using milestone URL
/gh-milestones:test-plan https://github.com/codekiln/langstar/milestone/8

What the command does:

Loads relevant testing documentation (progressive disclosure)
Reviews all issues and PRs in the milestone
Analyzes implementation from merged PRs
Generates comprehensive test plan document
Identifies gaps in test coverage

7.3 Test Plan Deliverable

The generated test plan should be added to the testing phase issue and should include:

Required sections:

Feature Overview: Summary of what's being tested
Test Scope: What's in scope and out of scope
SDK Unit Tests: Mocked tests for SDK methods
SDK Integration Tests: Real API tests with CRUD lifecycle
CLI Integration Tests: End-to-end CLI command tests
Error Conditions: All error scenarios to test
Edge Cases: Boundary conditions and unusual inputs
Pre-commit Validation: Checklist before implementation

Example structure:

# Test Plan: Runs Query Feature (Milestone ls-runs-query)

## Feature Overview
[Summary of runs query functionality]

## Test Scope
**In scope:**
- SDK query_runs method with all parameters
- CLI runs query command
- Pagination handling
- Error responses

**Out of scope:**
- [Features explicitly not covered]

## SDK Unit Tests (Mocked)
### 8.1.1 test_query_runs_success
- Mock POST /api/v1/runs/query
- Verify request structure
- Verify response parsing

[Additional test cases...]

## SDK Integration Tests (Real API)
### 8.2.1 test_query_runs_crud_lifecycle
- Create test project
- Create test runs
- Query runs with filters
- Verify results
- Clean up resources

[Additional test cases...]

## CLI Integration Tests
### 8.3.1 test_cli_runs_query_basic
- Run: `langstar runs query --project test-project`
- Verify output format
- Verify exit code

[Additional test cases...]

## Error Conditions
- Invalid API key
- Malformed filter expression
- Non-existent project
[Additional scenarios...]

## Edge Cases
- Empty result set
- Very large result set
- Special characters in filters
[Additional scenarios...]

## Pre-commit Validation
- [ ] All unit tests pass
- [ ] All integration tests pass
- [ ] cargo fmt --check passes
- [ ] cargo clippy passes

7.4 Update Testing Ticket with Test Plan

After generating the test plan:

Post test plan to the testing phase issue (e.g., 298.8-runs-testing)
Link test plan in issue description
Use test plan as implementation guide in Phase 8

Example issue update:

## Test Plan

See generated test plan below:

[Generated test plan content]

## Implementation Checklist
- [ ] SDK unit tests implemented
- [ ] SDK integration tests implemented
- [ ] CLI integration tests implemented
- [ ] All error conditions covered
- [ ] All edge cases covered
- [ ] Pre-commit checks passing

7.5 Benefits of Test Planning Phase

Before test implementation:

Comprehensive test coverage plan before writing code
Identifies missing test scenarios early
Ensures alignment between tests and requirements
Provides clear success criteria for Phase 8

Quality assurance:

Test plans are reviewed before implementation begins
Gaps in test coverage are identified before code is written
Testing phase has clear deliverables and acceptance criteria

Efficiency:

Automated test plan generation saves 1-2 hours of manual planning
Progressive disclosure loads only relevant testing docs (~4K tokens vs ~24K for all docs)
Test plan serves as implementation checklist

Phase 8: Testing

8.1 Unit Tests with Mocking

Location: In-module tests or sdk/tests/

Use httpmock for API mocking:

#[cfg(test)]
mod tests {
    use httpmock::prelude::*;

    #[tokio::test]
    async fn test_query_runs_success() {
        let server = MockServer::start();

        let mock = server.mock(|when, then| {
            when.method(POST)
                .path("/api/v1/runs/query");
            then.status(200)
                .json_body(json!({
                    "runs": [],
                    "cursors": null
                }));
        });

        // Test client against mock server
    }
}

8.2 Integration Tests

Location: sdk/tests/{feature}_test.rs or cli/tests/{feature}_command_test.rs

Requirements:

Mark with #[cfg_attr(not(feature = "integration-tests"), ignore)]
Use LANGSMITH_API_KEY and LANGSMITH_WORKSPACE_ID env vars
Clean up any created resources
Document prerequisites in test file header

Pattern (from cli/tests/README.md):

/// Integration test for runs query
///
/// Prerequisites:
/// - LANGSMITH_API_KEY set
/// - LANGSMITH_WORKSPACE_ID set
///
/// Run with: cargo test --features integration-tests
#[cfg_attr(not(feature = "integration-tests"), ignore)]
#[tokio::test]
async fn test_query_runs_integration() {
    // ...
}

8.3 Pre-Commit Validation

ALWAYS run before committing:

cargo fmt && \
cargo check --workspace --all-features && \
cargo clippy --workspace --all-features -- -D warnings && \
cargo test --workspace --all-features && \
cargo fmt --check

Phase 9: Test Audit

After implementing tests, verify that the implementation complies with both the test plan (Phase 7) and the project's testing guidelines. This phase catches common issues that slip through even well-intentioned test implementations.

9.1 Why Test Audit is Necessary

Experience has shown that test implementations often deviate from test plans in problematic ways (see Issue #637 post-mortem):

Common problems caught by audit:

Integration tests marked #[ignore] instead of properly conditional
CI not configured with required environment variables
Anemic tests that only verify exit codes, not actual behavior
Missing CRUD lifecycle verification (SDK → CLI → SDK)
Tests that don't clean up resources
Missing error condition coverage

9.2 Run Test Audit Command

Use the /gh-milestones:test-audit command to verify test compliance:

/gh-milestones:test-audit <milestone-name-or-number>

Examples:

# Using milestone name
/gh-milestones:test-audit ls-runs-query

# Using milestone number
/gh-milestones:test-audit 8

What the command does:

Loads the test plan from Phase 7
Loads project testing guidelines (HIGH_LEVEL_TESTING_GUIDELINES.md)
Analyzes implemented tests against the plan
Checks for common anti-patterns
Verifies CI configuration includes required environment variables
Generates compliance report with specific remediation steps

9.3 Audit Checklist

The audit verifies compliance with these requirements:

Test Structure:

Unit tests use #[cfg(test)] module pattern
Integration tests use proper feature flag: #[cfg_attr(not(feature = "integration-tests"), ignore)]
Tests are NOT unconditionally ignored with #[ignore]
Test files follow naming conventions (*_test.rs or *_command_test.rs)

Test Quality (Toyota Andon Cord):

Tests verify actual behavior, not just exit codes
SDK operations are verified through round-trip assertions
CLI tests verify output content, not just success/failure
Error conditions are tested with specific error type verification
Edge cases from test plan are covered

CRUD Lifecycle Pattern:

Integration tests create resources via SDK
Tests operate on resources via CLI or SDK under test
Tests verify results using SDK (not just CLI output)
Tests clean up created resources (even on failure)

CI Configuration:

Required environment variables listed in CI workflow
Integration test job has access to LANGSMITH_API_KEY
Integration test job has access to LANGSMITH_WORKSPACE_ID
Feature flag integration-tests is enabled in CI

9.4 Test Audit Deliverable

The audit produces a compliance report with:

Report Structure:

# Test Audit Report: [Milestone Name]

## Summary
- Tests Planned: [count from test plan]
- Tests Implemented: [count found]
- Compliance Rate: [percentage]
- Critical Issues: [count]
- Warnings: [count]

## Critical Issues (Must Fix)
### Issue 1: [Description]
- Location: [file:line]
- Problem: [specific issue]
- Remediation: [exact fix needed]

## Warnings (Should Fix)
### Warning 1: [Description]
...

## Test Plan Coverage Matrix
| Test Case (from plan) | Implemented? | File:Line | Notes |
|----------------------|--------------|-----------|-------|
| test_create_run      | ✅ Yes       | sdk/tests/runs_test.rs:45 | |
| test_query_runs_empty| ❌ No        | - | Missing |

## CI Configuration Status
- [ ] Environment variables configured
- [ ] Feature flags enabled
- [ ] Job dependencies correct

## Recommendations
1. [Specific action item]
2. [Specific action item]

9.5 Remediation Process

If the audit finds issues:

Critical issues must be fixed before merge
Warnings should be addressed unless explicitly justified
Re-run audit after fixes: /gh-milestones:test-audit <milestone>
Update test plan if new test cases were discovered

9.6 Benefits of Test Audit Phase

Quality Assurance:

Catches gaps between plan and implementation
Enforces Toyota Andon Cord principle
Prevents "tests that don't test anything" anti-pattern

Process Improvement:

Creates feedback loop to improve test planning
Documents common issues for future reference
Builds institutional knowledge about testing patterns

CI Reliability:

Ensures tests actually run in CI (not skipped)
Verifies environment configuration
Prevents "works locally, fails in CI" surprises

Phase 10: Documentation

10.1 Implementation Plan

Create docs/implementation/{issue-num}-{slug}-implementation-plan.md:

Executive summary
Research sources with links
Implementation phases with code snippets
Testing plan
Future enhancements

10.2 Update README

Add new commands to main README:

Command syntax
Example usage
Environment variables

10.3 In-Code Documentation

Rustdoc comments on all public items
Examples in doc comments where helpful
Link to research reports for complex decisions

Phase 11: Milestone Release

When the milestone's features ship in a GitHub release, use the /gh-milestones:release slash command to automate milestone cleanup.

11.1 Prerequisites

Before running milestone release:

All milestone PRs merged to main
GitHub release created and published
All sub-issues closed (or explicitly force release with FORCE_RELEASE=true)
CI/CD passing on main branch
CHANGELOG.md updated (if manual versioning)

11.2 Release Command

/gh-milestones:release <milestone> <version>

Examples:

# Using milestone name
/gh-milestones:release ls-prompt-structured-outputs v0.10.0

# Using milestone URL
/gh-milestones:release https://github.com/codekiln/langstar/milestone/7 v0.10.0

11.3 What Gets Automated

The /gh-milestones:release command performs the following actions:

Validates Release Exists: Confirms GitHub release is published
Checks Sub-Issue Completion: Warns if any sub-issues are still open (requires gh-sub-issue extension)
Updates Milestone Description: Prepends release link to milestone description
Closes Milestone: Marks milestone as closed
Adds Release Comment: Comments on parent issue with release link
Closes Parent Issue: Marks parent issue as closed

Example Output:

✅ **Milestone Release Tracking Complete**

📍 Milestone: ls-prompt-structured-outputs (#7)
🔗 Parent Issue: #402 - Add structured output prompt support
📦 Release: v0.10.0
🔗 Release URL: https://github.com/codekiln/langstar/releases/tag/v0.10.0

**Actions Completed:**
✅ Verified release v0.10.0 exists
✅ Validated sub-issue completion
✅ Milestone marked as closed
✅ Milestone description updated with release information
✅ Parent issue #402 closed with release comment

11.4 Manual Override

If sub-issues are intentionally still open, force the release:

FORCE_RELEASE=true /gh-milestones:release <milestone> <version>

Note: Not recommended. Best practice is to close all sub-issues before releasing.

11.5 Integration with Release Workflow

Typical release workflow:

# 1. Merge final PR for milestone
gh pr merge 385 --squash

# 2. Create GitHub release (or automated via CI)
gh release create v0.10.0 --generate-notes

# 3. Mark milestone as released
/gh-milestones:release "ls-prompt-structured-outputs" v0.10.0

11.6 Benefits

Consistency: Every milestone follows same release tracking pattern

Efficiency: Manual milestone updates take 5-10 minutes, automation completes in <10 seconds

Traceability: Clear link from milestone → release → parent issue

Validation: Enforces sub-issue completion, validates release exists

11.7 References

PR #442: /gh-milestones:release command implementation
Command Documentation: .claude/commands/gh-milestones:release.md
Example: Milestone #7 (ls-prompt-structured-outputs) released in v0.10.0

Consistency Checklist

Configuration Consistency

Uses same env var names as existing features (LANGSMITH_API_KEY, etc.)
Follows same precedence: env var > config file > defaults
Supports organization/workspace scoping consistently
Uses same output format options (json, table)

Code Style Consistency

Follows existing module structure
Uses same error handling patterns
Uses same serde patterns (rename_all, skip_serializing_if)
Follows same test organization

Testing Consistency

Unit tests use httpmock
Integration tests use feature flag
Tests document prerequisites
Tests clean up resources

Documentation Consistency

Research reports in reference/research/
Canonical OpenAPI specs in reference/openapi/langchain/{api}/
MANIFEST.md updated with provenance for spec fetches
Extracted fragments in reference/api-specs/{api}/
FRAGMENTS.md updated with jq queries for extractions
Implementation plans in docs/implementation/
Same markdown structure

Tools & Skills Reference

Tool/Skill	Purpose	Location
gh-sub-issue	Manage issue hierarchies	`.claude/skills/gh-sub-issue/SKILL.md`
setup-remote-repo-notes-dir	Research SDK codebases	`.claude/skills/setup-remote-repo-notes-dir/SKILL.md`
jq	OpenAPI spec validation	System tool
httpmock	Rust HTTP mocking	Cargo dependency
langgraph-docs MCP	LangGraph/LangSmith docs	MCP server

Example Sub-Issue Checklist Template

When creating sub-issues, use this template for scope:

Research Sub-Issue

## Scope
1. Set up research workspace with setup-remote-repo-notes-dir
2. Analyze Python SDK implementation
3. Document method signatures and parameters
4. Identify pagination patterns
5. Review tests for usage examples

## Deliverable
Research report at `reference/research/{num}-{slug}-precedent.md`

OpenAPI Validation Sub-Issue

## Scope
1. Fetch OpenAPI spec to reference/api-specs/
2. Extract relevant endpoint and schema definitions with jq
3. Validate research findings against spec
4. Document confirmations, corrections, discoveries

## Deliverable
Validation report at `reference/research/{num}-openapi-validation.md`

Milestone Lifecycle: From Conception to Release

Full Lifecycle Overview

The complete milestone lifecycle spans from preliminary research through GitHub release:

Phase	Name	When	Typical Duration
0.0	Pre-Epic Scouting	Before milestone (optional)	1-3 days
0	Epic Setup	Start of milestone	1 day
1-10	Standard Development	Implementation	1-4 weeks
11	Milestone Release	After merge + GitHub release	<1 hour (automated)

Decision Tree: When to Scout

Is this a new API feature needing research and technical context?
├── No → Skip to Phase 0 (Epic Setup)
└── Yes → Start with Phase 0.0 (Scout)
        ↓
    Scout gathers knowledge and insights
        ↓
    Use findings to author Phase 0 parent issue

Milestone States Over Time

Pre-Milestone (Phase 0.0): Scout issue exists, no milestone yet
- Research is exploratory
- No commitment to full implementation
- Scout PR merges directly to main
Milestone Created (Phase 0): Parent issue + milestone + sub-issues created
- Milestone attached to ALL issues
- Sub-issues link to parent via gh-sub-issue
- Development waves may be parallelized
Active Development (Phases 1-9): Sub-issues progress through standard phases
- PRs typically merge directly to main (not hierarchical)
- Milestone description updated with progress
- Sub-issues closed as PRs merge
Released (Phase 11): Milestone closed, linked to GitHub release
- /gh-milestones:release automates cleanup
- Parent issue closed with release comment
- Milestone description shows release link
- Audit trail: issue → milestone → release

Milestone Naming Best Practices

Pattern: Use short, hyphenated names for milestone titles

✅ ls-prompt-structured-outputs (clear, grep-able)
✅ ls-evals-basic (scoped)
❌ Structured Output Prompts Feature (spaces, verbose)

Benefits:

Easy to reference in commands: /gh-milestones:release ls-evals-basic v0.10.0
Grep-able in code and documentation
Works well with GitHub API and CLI tools

Milestone Management Anti-Patterns

Avoid:

❌ Creating milestone without parent issue
❌ Attaching milestone only to parent (not sub-issues)
❌ Manually closing milestone without release link
❌ Leaving parent issue open after release ships
❌ Skipping Phase 0.0 scout for unclear API features
❌ Closing milestone before all sub-issues are done

Success Metrics

A feature is complete when:

All sub-issues closed
Research and validation reports committed
SDK types and methods implemented
CLI commands functional
Unit tests passing (100% of new code)
Integration tests passing
Documentation updated
GitHub release published
Milestone closed via /gh-milestones:release (Phase 11)
Parent issue closed with release link

FilesExpand file tree

feature-development-process.md

Latest commit

History

feature-development-process.md

File metadata and controls

Standard Feature Development Process

Overview

Phase 0.0: Pre-Epic Scouting (Optional)

When to Use Pre-Epic Scouting

Scout Issue Template

Scout Issue Scope

Scout Issue Deliverables

Relationship to Milestone

Benefits of Pre-Epic Scouting

Phase 0: Epic Setup

0.1 Create Parent Issue (Epic)

0.2 Create GitHub Milestone

0.3 Create Sub-Issues Using gh-sub-issue Skill

0.4 Attach Milestone to ALL Issues

Phase 1: Research SDK Precedent

1.1 Set Up Research Workspace

1.2 Analyze Python SDK

1.3 Write Research Report

Phase 2: Design

2.1 DX Consistency Analysis

2.2 Configuration Integration

2.3 Business Purpose Research

2.4 Design Decisions Summary

Phase 3: OpenAPI Validation

3.1 OpenAPI Spec Management Pattern

3.2 Fetch or Update OpenAPI Specification

3.3 Extract Relevant Schemas with jq

3.4 Validate Research Against Spec

3.5 Document Discrepancies

Phase 4: SDK Types

4.1 Create Types Module

4.2 Register in lib.rs

Phase 5: SDK Client Methods

5.1 Add Client Methods

5.2 Handle Pagination

Phase 6: CLI Commands

6.1 Create Command Module

6.2 Register in CLI

6.3 Follow Configuration Patterns

Phase 7: Test Planning

7.1 Review Assets from Prior Phases

7.2 Generate Test Plan with /gh-milestones:test-plan

7.3 Test Plan Deliverable

7.4 Update Testing Ticket with Test Plan

7.5 Benefits of Test Planning Phase

Phase 8: Testing

8.1 Unit Tests with Mocking

8.2 Integration Tests

8.3 Pre-Commit Validation

Phase 9: Test Audit

9.1 Why Test Audit is Necessary

9.2 Run Test Audit Command

9.3 Audit Checklist

9.4 Test Audit Deliverable

9.5 Remediation Process

9.6 Benefits of Test Audit Phase

Phase 10: Documentation

10.1 Implementation Plan

10.2 Update README

10.3 In-Code Documentation

Phase 11: Milestone Release

11.1 Prerequisites

11.2 Release Command

11.3 What Gets Automated

11.4 Manual Override

11.5 Integration with Release Workflow

11.6 Benefits

11.7 References

Consistency Checklist

Configuration Consistency

Code Style Consistency

Testing Consistency

Documentation Consistency

Tools & Skills Reference