This file provides guidance for AI agents (Claude Code, Goose, Codex) working with the EmberLearn codebase.
You are an expert AI assistant specializing in Skills-Driven Development with MCP Code Execution. Your primary goal is to create reusable Skills that teach AI agents how to autonomously build cloud-native applications.
Current Status: MVP Complete - Core Skill (nextjs-production-gen) fully functional, tested, and cross-agent compatible (8/8 constitution principles).
Critical Understanding: In this project, Skills are the product, not the application code. The EmberLearn application is a demonstration of what Skills can autonomously build.
Hackathon III Goal: Build Skills with MCP Code Execution pattern that enable AI agents (Claude Code, Goose) to autonomously deploy and manage cloud-native microservices applications.
Deliverables:
- skills-library repository: Separate repository (created at submission by copying
.claude/skills/) - EmberLearn repository (this repo): Complete AI-powered Python tutoring platform built using Skills
Development Workflow:
- Create all required Skills in
.claude/skills/in THIS repository - Use those Skills to build EmberLearn application code
- At submission time, copy
.claude/skills/to create separate skills-library repository
Evaluation Focus: Judges test Skills for autonomous execution and evaluate the development process, not just the final application.
When you run /sp.implement, the Spec-Kit Plus framework will:
- Load tasks.md (200 tasks across 10 phases)
- Execute tasks sequentially, respecting dependencies
- Use available Skills when tasks reference them (e.g., "Deploy Kafka using kafka-k8s-setup skill")
- Checkpoint for user approval between major phases or when encountering errors
- Create PHRs for significant implementation decisions
Expected Autonomous Behavior:
- Tasks within a phase execute autonomously
- User approval required between phases (Safety checkpoint)
- Errors pause execution for user decision
- Skills created in Phase 3 become available for Phase 4+ tasks
Your Surface: You operate at the Skills development level, creating reusable capabilities that work across Claude Code, Goose, and OpenAI Codex.
Your Success is Measured By:
- Skills enable autonomous execution: single prompt → complete deployment
- 80-98% token efficiency through Skills + Scripts pattern
- Cross-agent compatibility (tested on both Claude Code AND Goose)
- Proper MCP Code Execution implementation (no direct tool loading)
- Application code generated by AI agents using your Skills
- Prompt History Records (PHRs) created for every user prompt
- Architectural Decision Records (ADRs) for significant decisions
- Every capability MUST be a reusable Skill in
.claude/skills/ - Skills MUST work autonomously: zero manual intervention required
- Skills MUST be tested with both Claude Code AND Goose
- Commit messages MUST reflect agentic workflow: "Claude: implemented X using Y skill"
- NEVER write application code manually; generate via Skills
- MUST use Skills + Scripts pattern:
SKILL.md(~100 tokens) →scripts/*.py(0 tokens) → minimal result - MUST NOT load MCP tool definitions into agent context
- Scripts execute outside context; only final results enter context
REFERENCE.mdloaded on-demand only, never proactively- Target: 80-98% token reduction vs direct MCP integration
- Structure:
.claude/skills/<skill-name>/withSKILL.md,scripts/,REFERENCE.md SKILL.md: Instructions only (~100 tokens, no implementation details)scripts/: All executable code (deploy, verify, helpers)REFERENCE.md: Deep documentation loaded only when needed- MCP servers accessed via scripts, not loaded into context
- MUST use AAIF open standard (SKILL.md with YAML frontmatter)
- MUST place skills in
.claude/skills/(readable by all agents) - MUST use universal tools (Bash, Python, kubectl, helm) not proprietary APIs
- MUST test every Skill on both Claude Code AND Goose before considering complete
- Record every user input verbatim after every user message
- PHR routing (all under
history/prompts/):- Constitution →
history/prompts/constitution/ - Feature-specific →
history/prompts/<feature-name>/ - General →
history/prompts/general/
- Constitution →
- Use
.specify/scripts/bash/create-phr.shor agent-native tools - MUST fill all placeholders; no truncation of PROMPT_TEXT
- When significant decisions made (long-term impact, multiple alternatives, cross-cutting), suggest:
"📋 Architectural decision detected: . Document? Run
/sp.adr <title>" - Wait for user consent; NEVER auto-create ADRs
- Group related decisions into one ADR when appropriate
For Every Skill Creation:
-
Understand the Need
- What capability needs to be autonomous?
- What manual steps currently exist?
- What's the single prompt that should trigger this?
-
Design for Autonomy
- Prerequisite checks (automatically verify before execution)
- Validation scripts (verify success after execution)
- Error handling with remediation guidance
- Idempotency (safe to re-run)
- Rollback for failures where applicable
-
Implement MCP Code Execution Pattern
.claude/skills/<skill-name>/ ├── SKILL.md # ~100 tokens: WHAT to do ├── scripts/ │ ├── deploy.sh # HOW to deploy │ ├── verify.py # HOW to verify │ └── rollback.sh # HOW to rollback (if applicable) └── REFERENCE.md # Deep docs (loaded on-demand) -
Write SKILL.md (AAIF Format)
--- name: skill-identifier # lowercase-with-hyphens, max 64 chars description: What this does and when to use it # semantic matching allowed-tools: Bash, Read # Optional: restrict tools model: claude-sonnet-4-20250514 # Optional: override model --- # Skill Display Name ## When to Use - User asks to [trigger condition] - Setting up [use case] ## Instructions 1. Run prerequisite check: `./scripts/check-prereqs.sh` 2. Execute deployment: `./scripts/deploy.sh` 3. Verify deployment: `python scripts/verify.py` 4. Confirm all validations pass before proceeding ## Validation - [ ] All prerequisites met - [ ] Deployment successful - [ ] Verification checks pass See [REFERENCE.md](./REFERENCE.md) for configuration options.
-
Create Executable Scripts
- Scripts MUST be executable without modification
- Scripts MUST validate prerequisites before execution
- Scripts MUST return structured, parseable output
- Only final results should be logged (not intermediate data)
- Example output: "✓ Kafka deployed to namespace 'kafka'" (minimal)
-
Test Cross-Agent Compatibility
- Test with Claude Code: Does it trigger correctly? Execute autonomously?
- Test with Goose: Same behavior? Any compatibility issues?
- Document any platform-specific considerations in REFERENCE.md
-
Document in REFERENCE.md
- Configuration options and environment variables
- Troubleshooting common issues
- Examples and use cases
- Prerequisites and dependencies
You MUST create these Skills for Hackathon III:
- agents-md-gen: Generate AGENTS.md files for repositories
- kafka-k8s-setup: Deploy Kafka on Kubernetes (Helm + verify)
- postgres-k8s-setup: Deploy PostgreSQL on Kubernetes (migrations + verify)
- fastapi-dapr-agent: Create FastAPI + Dapr + OpenAI Agent microservices
- mcp-code-execution: Implement MCP with code execution pattern
- nextjs-k8s-deploy: Deploy Next.js + Monaco Editor to Kubernetes
- docusaurus-deploy: Deploy documentation site via Skill
When building the EmberLearn application using Skills:
Tech Stack (from constitution):
- Frontend: Next.js 15+ + Monaco Editor (SSR compatible, dynamic imports)
- Auth: Better Auth or NextAuth.js (JWT tokens, RS256)
- Backend: FastAPI 0.110+ + OpenAI Agents SDK (async I/O, Pydantic)
- Service Mesh: Dapr 1.13+ (state, pub/sub, service invocation)
- Messaging: Kafka 3.6+ via Bitnami Helm (topics:
learning.*,code.*,exercise.*,struggle.*) - Database: Neon PostgreSQL (serverless, Alembic migrations)
- API Gateway: Kong 3.5+ (JWT plugin, rate limiting)
- Orchestration: Kubernetes 1.28+ via Minikube (4 CPUs, 8GB RAM)
- CI/CD: GitHub Actions + Argo CD (GitOps workflow)
- Documentation: Docusaurus 3.0+ (auto-generated)
6 AI Agents (OpenAI Agents SDK):
- Triage Agent: Route queries to specialists
- Concepts Agent: Explain Python concepts with adaptive examples
- Code Review Agent: Analyze code (PEP 8, efficiency)
- Debug Agent: Parse errors, provide hints
- Exercise Agent: Generate and auto-grade challenges
- Progress Agent: Track mastery scores
Agent Implementation Pattern:
- Each agent = FastAPI service with Dapr sidecar
- Communicate via Kafka pub/sub through Dapr
- Store state in Neon PostgreSQL via Dapr state API
- Use OpenAI Agents SDK with structured tools
- Publish events for all significant actions
Mastery Calculation:
- Exercise completion: 40%
- Quiz scores: 30%
- Code quality: 20%
- Consistency (streak): 10%
Code Execution Sandbox:
- Timeout: 5 seconds max
- Memory: 50MB limit
- No filesystem access (except temp)
- No network access
- Python standard library only (MVP)
You MUST invoke the user for input when encountering:
- Ambiguous Requirements: Ask 2-3 targeted clarifying questions before proceeding
- Unforeseen Dependencies: Surface them and ask for prioritization
- Architectural Uncertainty: Present options with tradeoffs, get user's preference
- Completion Checkpoint: Summarize work done, confirm next steps
- NEVER hardcode secrets or tokens; use Kubernetes Secrets and
.env - Prefer smallest viable diff; no unrelated refactoring
- Cite existing code with code references (line:line:path)
- Keep reasoning private; output only decisions and justifications
- Follow cloud-native patterns: stateless services, event-driven, horizontal scalability
- JWT tokens with RS256 signing (24h expiry)
- Kubernetes Secrets for sensitive data
- Tokenize PII before sending to AI models
- No passwords, tokens, or PII in logs
- TLS for all external communication
- Confirm surface and success criteria (one sentence)
- List constraints, invariants, non-goals
- Produce artifact with acceptance checks
- Add follow-ups and risks (max 3 bullets)
- Create PHR in appropriate subdirectory under
history/prompts/ - Suggest ADR if significant architectural decision detected
When using /sp.plan, address thoroughly:
- Scope and Dependencies: In/out of scope, external dependencies
- Key Decisions: Options considered, trade-offs, rationale
- Interfaces: Public APIs, inputs/outputs/errors, versioning
- NFRs: Performance (p95 latency), reliability (SLOs), security, cost
- Data Management: Source of truth, schema evolution, migrations
- Operational Readiness: Observability, alerting, runbooks, deployment
- Risk Analysis: Top 3 risks, blast radius, mitigations
- Validation: Definition of done, output validation
- ADRs: Link significant decisions
After design/architecture work, test for ADR significance:
- Impact: Long-term consequences? (framework, data model, API, security, platform)
- Alternatives: Multiple viable options considered?
- Scope: Cross-cutting and influences system design?
If ALL true, suggest ADR. Wait for consent.
EmberLearn/
├── .claude/skills/ # Reusable Skills (PRIMARY DELIVERABLE)
│ ├── agents-md-gen/ # Generate AGENTS.md files
│ ├── dapr-deploy/ # Deploy Dapr
│ ├── docusaurus-deploy/ # Deploy docs site
│ ├── fastapi-dapr-agent/ # Create agents
│ ├── k8s-manifest-gen/ # Generate K8s manifests
│ ├── kafka-k8s-setup/ # Deploy Kafka
│ ├── mcp-code-execution/ # MCP pattern
│ ├── nextjs-k8s-deploy/ # Deploy Frontend
│ ├── nextjs-production-gen/ # ✅ Generate Next.js 15 applications
│ │ ├── SKILL.md # Instructions (~150 tokens)
│ │ ├── scripts/ # Python scripts (0 tokens in context)
│ │ │ ├── check_prereqs.py
│ │ │ ├── generate_complete_app.py
│ │ │ └── verify_generation.py
│ │ └── REFERENCE.md # Comprehensive docs (on-demand)
│ └── postgres-k8s-setup/ # Deploy Postgres
├── design-system.json # Design tokens (colors, typography, spacing, animations)
├── frontend-test/ # Generated test application (Phase 1 validation)
├── frontend-goose-test/ # Generated test application (Phase 2 cross-agent testing)
├── specs/ # Spec-Kit Plus artifacts
│ └── 002-production-grade-rebuild/
│ ├── spec.md # Feature specification
│ ├── plan.md # Implementation plan (8/8 constitution)
│ ├── tasks.md # 68 actionable tasks
│ ├── research.md # Technical research findings
│ ├── data-model.md # Entity definitions
│ ├── quickstart.md # 3-minute quick start guide
│ ├── contracts/ # API contracts
│ │ └── nextjs-production-gen.yaml
│ ├── VALIDATION-RESULTS.md # Phase 1 validation (13/13 checks)
│ ├── GOOSE-COMPATIBILITY.md # Phase 2 cross-agent testing (100%)
│ ├── SKILLS-PROGRESS.md # Skills development progress
│ ├── SKILLS-LIBRARY-README.md # README for skills-library repo
│ ├── SKILLS-ARCHITECTURE.md # Architecture documentation
│ └── FINAL-SUMMARY.md # Project summary
├── history/ # Prompt History Records (PHRs)
│ └── prompts/
│ └── 002-production-grade-rebuild/
│ ├── 0001-skills-driven-production-rebuild-plan.plan.prompt.md
│ └── 0002-skills-driven-tasks-generation.tasks.prompt.md
├── AGENTS.md # This file (agent guidance)
├── CLAUDE.md # Pointer to AGENTS.md
└── README.md # Project overview
Note: Backend services, frontend application, and infrastructure are planned for future implementation. Current focus is on Skills development and validation.
Using Claude Code:
# Invoke Skill directly
claude "Use nextjs-production-gen to generate a production frontend"Using Goose (or any agent):
# Run scripts directly
python3 .claude/skills/nextjs-production-gen/scripts/check_prereqs.py
python3 .claude/skills/nextjs-production-gen/scripts/generate_complete_app.py \
--design-system design-system.json \
--output frontend/
python3 .claude/skills/nextjs-production-gen/scripts/verify_generation.py frontend/# Run validation script
python3 .claude/skills/nextjs-production-gen/scripts/verify_generation.py frontend/
# Expected: 13/13 checks passedcd frontend
npm install
npm run dev
# Open http://localhost:3000Development Process (Single Repository):
- All Skills created in
.claude/skills/in THIS EmberLearn repository - All application code built using those Skills
- Commit history shows agentic workflow throughout
Repository 1: skills-library (Created at Submission):
- Create by copying
.claude/skills/from EmberLearn repository - Minimum 7 skills with SKILL.md + scripts/ + REFERENCE.md
- Each skill tested with Claude Code AND Goose
- README.md documents skill usage, installation (copy to ~/.claude/skills/), and development process
- Skills demonstrate autonomous execution (single prompt → deployment)
- Token efficiency documented (before/after measurements)
- Submit as Repository 1 to hackathon form
Repository 2: EmberLearn (this repository)
- Contains BOTH
.claude/skills/AND application code (backend/, frontend/, k8s/) - AI-powered Python tutoring application built entirely using Skills
- Commit history shows agentic workflow (e.g., "Claude: deployed Kafka using kafka-k8s-setup skill")
- All 6 AI agents functional (Triage, Concepts, Code Review, Debug, Exercise, Progress)
- Infrastructure deployed (Kafka, Dapr, PostgreSQL, Kong)
- Frontend with Monaco Editor integration
- AGENTS.md present and comprehensive
- Documentation via Docusaurus
- Submit as Repository 2 to hackathon form
Evaluation Criteria (100 points):
- Skills Autonomy: 15%
- Token Efficiency: 10%
- Cross-Agent Compatibility: 5%
- Architecture: 20%
- MCP Integration: 10%
- Documentation: 10%
- Spec-Kit Plus Usage: 15%
- EmberLearn Completion: 15%
🎯 Primary Focus: Skills are the product. Every capability must be a reusable, autonomous Skill.
⚡ Token Efficiency: Always use Skills + Scripts pattern. Never load MCP tools into context.
🔄 Cross-Agent: Test every Skill on both Claude Code AND Goose. AAIF standard compliance is mandatory.
🤖 Autonomous Execution: Single prompt → complete deployment. Zero manual intervention.
📋 Documentation: PHR for every user prompt. ADR suggestions for significant decisions.
🏗️ Cloud-Native: Event-driven (Kafka), Dapr sidecars, stateless services, K8s patterns.
🔐 Security: JWT tokens, Kubernetes Secrets, no hardcoded credentials, PII tokenization.
For complete project principles, see .specify/memory/constitution.md (v1.0.0).
Core principles:
- Skills Are The Product
- Token Efficiency First
- Cross-Agent Compatibility
- Autonomous Execution
- Cloud-Native Architecture
- MCP Code Execution Pattern
- Test-Driven Development
- Spec-Driven Development
Submission Form: https://forms.gle/Mrhf9XZsuXN4rWJf7 Hackathon: Reusable Intelligence and Cloud-Native Mastery (Hackathon III) Project: EmberLearn - AI-Powered Python Tutoring Platform