This file documents non-obvious problems, solutions, and patterns discovered during development. Make sure these are regularly reviewed and updated, removing outdated entries or those replaced by better practices or code or tools, updating those where the best practice has evolved.
Claude CLI was not reliably available in DevContainers, and there was no visibility into what tools were installed during container creation.
- Custom installation approach: Previously attempted to install Claude CLI via npm in post-create script (was commented out, indicating unreliability)
- Broken pipx feature URL: Used
devcontainers-contribwhich was incorrect - No logging: Post-create script had no output to help diagnose issues
- No status reporting: Users couldn't easily see what tools were available
Switched to declarative DevContainer features instead of custom installation scripts:
devcontainer.json changes:
// Fixed broken pipx feature URL
"ghcr.io/devcontainers-extra/features/pipx-package:1": { ... }
// Added official Claude Code feature
"ghcr.io/anthropics/devcontainer-features/claude-code:1": {},
// Added VSCode extension
"extensions": ["anthropic.claude-code", ...]
// Named container for easier identification
"runArgs": ["--name=amplifier_devcontainer"]post-create.sh improvements:
# Added logging to persistent file for troubleshooting
LOG_FILE="/tmp/devcontainer-post-create.log"
exec > >(tee -a "$LOG_FILE") 2>&1
# Added development environment status report
echo "📋 Development Environment Ready:"
echo " • Python: $(python3 --version 2>&1 | cut -d' ' -f2)"
echo " • Claude CLI: $(claude --version 2>&1 || echo 'NOT INSTALLED')"
# ... other tools- Use official DevContainer features over custom scripts: Features are tested, maintained, and more reliable than custom npm installs
- Declarative > imperative: Define what you need in devcontainer.json rather than scripting installations
- Add logging for troubleshooting: Persistent logs help diagnose container build issues
- Provide status reporting: Show users what tools are available after container creation
- Test with fresh containers: Only way to verify DevContainer configuration works
- Prefer official DevContainer features from
ghcr.io/anthropics/,ghcr.io/devcontainers/, etc. - Add logging (
teeto a log file) in post-create scripts for troubleshooting - Include tool version reporting to confirm installations
- Use named containers (
runArgs) for easier identification in Docker Desktop - Test DevContainer changes by rebuilding containers from scratch
make install fails with ERR_PNPM_NO_GLOBAL_BIN_DIR error when trying to install global npm packages via pnpm in fresh DevContainer builds.
Two issues combined to cause the failure:
- Missing SHELL environment variable: During DevContainer post-create script execution, the
SHELLenvironment variable is not set - pnpm setup requires SHELL: The
pnpm setupcommand fails withERR_PNPM_UNKNOWN_SHELLwhenSHELLis not set - Silent failure: The error was hidden by
|| truein the script, allowing the script to continue and report success even though pnpm wasn't configured
From the post-create log:
🔧 Setting up pnpm global bin directory...
ERR_PNPM_UNKNOWN_SHELL Could not infer shell type.
Set the SHELL environment variable to your active shell.
✅ pnpm configured # <-- False success!
Fixed post-create script to explicitly set SHELL before running pnpm setup:
post-create.sh addition:
echo "🔧 Setting up pnpm global bin directory..."
# Ensure SHELL is set for pnpm setup
export SHELL="${SHELL:-/bin/bash}"
# Configure pnpm to use a global bin directory
pnpm setup 2>&1 | grep -v "^$" || true
# Export for current session (will also be in ~/.bashrc for future sessions)
export PNPM_HOME="/home/vscode/.local/share/pnpm"
export PATH="$PNPM_HOME:$PATH"
echo " ✅ pnpm configured"This ensures:
- SHELL is explicitly set before pnpm setup runs
- pnpm's global bin directory is configured on first container build
- The configuration is added to
~/.bashrcfor all future sessions - The environment variables are set for the post-create script itself
- SHELL not set in post-create context - DevContainer post-create scripts run in an environment where SHELL may not be set
- pnpm requires SHELL - Unlike npm, pnpm needs to know the shell type to modify the correct config file
- Silent failures are dangerous - Using
|| truehid the actual error; consider logging errors even when continuing - Check the logs - The
/tmp/devcontainer-post-create.logrevealed the actual error that was hidden from the console
- Always set SHELL explicitly in post-create scripts before running shell-dependent commands
- Check post-create logs (
/tmp/devcontainer-post-create.log) after rebuilding containers - Consider conditional error handling instead of blanket
|| trueto catch real failures - Test
make installas part of DevContainer validation
Knowledge synthesis and other file operations were experiencing intermittent I/O errors (OSError errno 5) in WSL2 environment. The errors appeared random but were actually caused by OneDrive cloud sync delays.
The ~/amplifier directory was symlinked to a OneDrive folder on Windows (C:\ drive). When files weren't downloaded locally ("cloud-only" files), file operations would fail with I/O errors while OneDrive fetched them from the cloud. This affects:
- WSL2 + OneDrive: Symlinked directories from Windows OneDrive folders
- Other cloud sync services: Dropbox, Google Drive, iCloud Drive can cause similar issues
- Network drives: Similar delays can occur with network-mounted filesystems
Two-part solution implemented:
- Immediate fix: Added retry logic with exponential backoff and informative warnings
- Long-term fix: Created centralized file I/O utility module
# Enhanced retry logic in events.py with cloud sync warning:
for attempt in range(max_retries):
try:
with open(self.path, "a", encoding="utf-8") as f:
f.write(json.dumps(asdict(rec), ensure_ascii=False) + "\n")
f.flush()
return
except OSError as e:
if e.errno == 5 and attempt < max_retries - 1:
if attempt == 0: # Log warning on first retry
logger.warning(
f"File I/O error writing to {self.path} - retrying. "
"This may be due to cloud-synced files (OneDrive, Dropbox, etc.). "
"If using cloud sync, consider enabling 'Always keep on this device' "
f"for the data folder: {self.path.parent}"
)
time.sleep(retry_delay)
retry_delay *= 2
else:
raise
# New centralized utility (amplifier/utils/file_io.py):
from amplifier.utils.file_io import write_json, read_json
write_json(data, filepath) # Automatically handles retriesHigh-priority file operations requiring retry protection:
- Memory Store (
memory/core.py) - Saves after every operation - Knowledge Store (
knowledge_synthesis/store.py) - Append operations - Content Processing - Document and image saves
- Knowledge Integration - Graph saves and entity cache
- Synthesis Engine - Results saving
- Cloud sync can cause mysterious I/O errors - Not immediately obvious from error messages
- Symlinked directories inherit cloud sync behavior - WSL directories linked to OneDrive folders are affected
- "Always keep on device" setting fixes it - Ensures files are locally available
- Retry logic should be informative - Tell users WHY retries are happening
- Centralized utilities prevent duplication - One retry utility for all file operations
- Enable "Always keep on this device" for any OneDrive folders used in development
- Use the centralized
file_ioutility for all file operations - Add retry logic proactively for user-facing file operations
- Consider data directory location when setting up projects (prefer local over cloud-synced)
- Test file operations with cloud sync scenarios during development
Generated CLI tools consistently fail with predictable patterns:
- Non-recursive file discovery (using
*.mdinstead of**/*.md) - No minimum input validation (synthesis with 1 file when 2+ needed)
- Silent failures without user feedback
- Poor visibility into what's being processed
- Missing standard patterns: No enforced template for common requirements
- Agent guidance confusion: Documentation references
examples/as primary location - Philosophy violations: Generated code adds complexity instead of embracing simplicity
Standard tool patterns (enforced in all generated tools):
# Recursive file discovery
files = list(Path(dir).glob("**/*.md")) # NOT "*.md"
# Minimum input validation
if len(files) < required_min:
logger.error(f"Need at least {required_min} files, found {len(files)}")
sys.exit(1)
# Clear progress visibility
logger.info(f"Processing {len(files)} files:")
for f in files[:5]:
logger.info(f" • {f.name}")Tool generation checklist:
- Uses recursive glob patterns for file discovery
- Validates minimum inputs before processing
- Shows clear progress/activity to user
- Fails fast with descriptive errors
- Uses defensive utilities from toolkit
- Templates prevent predictable failures: Common patterns should be enforced
- Visibility prevents confusion: Always show what's being processed
- Fail fast and loud: Silent failures create debugging nightmares
- Philosophy must be enforced: Generated code often violates simplicity
- Validate against checklist before accepting generated tools
- Update agent guidance to specify correct directories
- Test with edge cases (empty dirs, single file, nested structures)
- Review generated code for philosophy compliance
Some CCSDK tools experienced multiple failure modes when processing LLM responses:
- JSON parsing errors when LLMs returned markdown-wrapped JSON or explanatory text
- Context contamination where LLMs referenced system instructions in their outputs
- Transient failures with no retry mechanism causing tool crashes
LLMs don't reliably return pure JSON responses, even with explicit instructions. Common issues:
- Format variations: LLMs wrap JSON in markdown blocks, add explanations, or include preambles
- Context leakage: System prompts and instructions bleed into generated content
- Transient failures: API timeouts, rate limits, and temporary errors not handled gracefully
Created minimal defensive utilities in amplifier/ccsdk_toolkit/defensive/:
# parse_llm_json() - Extracts JSON from any LLM response format
result = parse_llm_json(llm_response)
# Handles: markdown blocks, explanations, nested JSON, malformed quotes
# retry_with_feedback() - Intelligent retry with error correction
result = await retry_with_feedback(
async_func=generate_synthesis,
prompt=prompt,
max_retries=3
)
# Provides error feedback to LLM for self-correction on retry
# isolate_prompt() - Prevents context contamination
clean_prompt = isolate_prompt(user_prompt)
# Adds barriers to prevent system instruction leakageTest Results: Fresh md_synthesizer run with defensive utilities showed dramatic improvement:
- ✅ Zero JSON parsing errors (was 100% failure rate in original versions)
- ✅ Zero context contamination (was synthesizing from wrong system files)
- ✅ Zero crashes (was failing with exceptions on basic operations)
- ✅ 62.5% completion rate (5 of 8 ideas expanded before timeout vs. 0% before)
- ✅ High-quality output - Generated 8 relevant, insightful ideas from 3 documents
Performance Profile:
- Stage 1 (Summarization): ~10-12 seconds per file - Excellent
- Stage 2 (Synthesis): ~3 seconds per idea - Excellent with zero JSON failures
- Stage 3 (Expansion): ~45 seconds per idea - Reasonable but could be optimized
Key Wins:
parse_llm_json()eliminated all JSON parsing failuresisolate_prompt()prevented system context leakage- Progress checkpoint system preserved work through timeout
- Tool now fundamentally sound - remaining work is optimization, not bug fixing
- Extraction over validation: Don't expect perfect JSON, extract it from whatever format arrives
- Feedback loops: When retrying, tell the LLM what went wrong so it can correct
- Context isolation: Use clear delimiters to separate user content from system instructions
- Defensive by default: All CCSDK tools should assume LLM responses need cleaning
- Test early with real data: Defensive utilities prove their worth only under real conditions
- Use
parse_llm_json()for all LLM JSON responses - never use rawjson.loads() - Wrap LLM operations with
retry_with_feedback()for automatic error recovery - Apply
isolate_prompt()when user content might be confused with instructions
Implementing dual backend support (Claude Code and Codex) revealed several architectural differences and limitations that affect feature parity and testing strategies.
Claude Code and Codex have fundamentally different architectures:
- Automation Model: Claude Code uses automatic hooks (SessionStart, PostToolUse, PreCompact, Stop) while Codex requires explicit MCP tool invocation or wrapper scripts
- Agent Spawning: Claude Code has native Task tool for seamless agent spawning; Codex uses
codex execsubprocess with different invocation model - Configuration: Claude Code uses JSON (settings.json) with limited profiles; Codex uses TOML (config.toml) with rich profile support
- Transcript Format: Claude Code uses single text files (compact_*.txt); Codex uses session directories with multiple files (transcript.md, transcript_extended.md, history.jsonl)
- Tool Availability: Claude Code has Task, TodoWrite, WebFetch, WebSearch; Codex has Read, Write, Edit, Grep, Glob, Bash
1. Backend Abstraction Layer (amplifier/core/backend.py):
- Created
AmplifierBackendabstract base class with methods:initialize_session(),finalize_session(),run_quality_checks(),export_transcript() - Implemented
ClaudeCodeBackendandCodexBackendconcrete classes - Both backends delegate to same amplifier modules (memory, extraction, search) ensuring consistency
- Factory pattern (
BackendFactory) for backend instantiation based on environment/config
2. Agent Abstraction Layer (amplifier/core/agent_backend.py):
- Created
AgentBackendabstract base class withspawn_agent()method ClaudeCodeAgentBackenduses Claude Code SDK Task toolCodexAgentBackendusescodex execsubprocess- Agent definitions converted from Claude format to Codex format (removed Task tool references, adapted tools array)
3. MCP Servers for Codex (.codex/mcp_servers/):
- Implemented three MCP servers to replace Claude Code hooks:
session_manager- Replaces SessionStart/Stop hooksquality_checker- Replaces PostToolUse hooktranscript_saver- Replaces PreCompact hook
- Used FastMCP framework for rapid development
- Servers expose tools that must be explicitly invoked (vs automatic hooks)
4. Wrapper Scripts:
amplify-codex.sh- Bash wrapper providing hook-like experience for Codexamplify.py- Unified Python CLI for both backends.codex/tools/session_init.pyandsession_cleanup.py- Standalone session management
5. Configuration System (amplifier/core/config.py):
- Pydantic
BackendConfigwith environment variable support - Configuration precedence: CLI args > env vars > .env file > defaults
- Auto-detection when
AMPLIFIER_BACKENDnot set - Validation for backend types and profiles
Full Parity:
- ✅ Memory system (both use same MemoryStore, MemorySearcher, MemoryExtractor)
- ✅ Quality checks (both use same
make checkcommand) - ✅ Agent spawning (different invocation, same agent definitions)
- ✅ Transcript export (different formats, both functional)
- ✅ Configuration management (different formats, both comprehensive)
Partial Parity:
⚠️ Automation: Claude Code hooks are automatic; Codex requires explicit tool calls or wrapper script⚠️ Task tracking: Claude Code has TodoWrite; Codex has no equivalent (use external tools)⚠️ Slash commands: Claude Code has native support; Codex has no equivalent (use MCP tools or natural language)⚠️ Notifications: Claude Code has desktop notifications; Codex returns tool responses only
No Parity (Intentional):
- ❌ VS Code integration: Claude Code only (Codex is CLI-first)
- ❌ Profiles: Codex only (Claude Code has single configuration)
- ❌ MCP servers: Codex only (Claude Code uses hooks)
1. CLI Availability in Tests:
- Challenge: Integration tests require Claude CLI or Codex CLI to be installed
- Solution: Mock subprocess calls at the boundary; test backend abstraction logic without requiring real CLIs
- Impact: Tests validate command construction and orchestration but not actual CLI behavior
2. MCP Protocol Testing:
- Challenge: Testing MCP servers requires JSON-RPC communication over stdio
- Solution: Start servers as subprocesses and communicate via stdin/stdout; alternatively mock FastMCP for unit tests
- Impact: Integration tests are more complex but validate real protocol compliance
3. Async Testing:
- Challenge: Many backend operations are async (memory extraction, agent spawning)
- Solution: Use
@pytest.mark.asynciodecorator and pytest-asyncio plugin - Impact: Tests must handle async/await correctly; some fixtures need async variants
4. Environment Isolation:
- Challenge: Tests must not interfere with each other or real project data
- Solution: Use temp_dir fixtures, mock environment variables, create isolated project structures
- Impact: Tests are slower due to setup/teardown but are reliable and deterministic
5. Cross-Backend Validation:
- Challenge: Verifying both backends produce identical results for same operations
- Solution: Run same test scenarios with both backends, compare outputs
- Impact: Test suite is larger but provides confidence in feature parity
- Abstraction enables testing: Backend abstraction layer allows testing workflows without requiring real CLIs
- Mock at boundaries: Mock subprocess calls and file I/O, but test real backend logic
- Shared modules ensure consistency: Both backends using same amplifier modules (memory, extraction, search) guarantees identical behavior
- Configuration is critical: Proper configuration management (precedence, validation, defaults) is essential for dual-backend support
- Documentation prevents confusion: Comprehensive docs (CODEX_INTEGRATION.md, BACKEND_COMPARISON.md, MIGRATION_GUIDE.md) are essential for users
- Smoke tests validate critical paths: Fast smoke tests catch regressions without full integration test suite
- Wrapper scripts bridge gaps: amplify-codex.sh provides hook-like experience for Codex despite lack of native hooks
Claude Code Limitations:
- No profile support (single configuration for all workflows)
- Limited CI/CD integration (requires VS Code)
- No headless operation (VS Code extension only)
- Hooks can't be easily disabled (always run)
Codex Limitations:
- No automatic hooks (must invoke tools explicitly or use wrapper)
- No slash commands (use MCP tools or natural language)
- No TodoWrite equivalent (use external task tracking)
- No desktop notifications (tool responses only)
- Requires wrapper script for convenient session management
Testing Limitations:
- Integration tests mock CLI calls (don't test actual Claude/Codex behavior)
- MCP server tests require subprocess communication (more complex)
- Agent spawning tests mock SDK/subprocess (don't test actual agent execution)
- Cross-backend tests assume both backends are available (may not be true in all environments)
- Use backend abstraction layer for all backend operations (don't call CLIs directly)
- Test both backends for any new feature to ensure parity
- Document limitations clearly when features can't be replicated
- Use wrapper scripts to provide consistent user experience across backends
- Keep backend-specific code isolated in
.claude/and.codex/directories - Maintain comprehensive documentation for both backends
- Run smoke tests in CI to catch regressions early
- Update DISCOVERIES.md when new limitations are found