AGENTS.md - EmberLearn Repository

This file provides guidance for AI agents (Claude Code, Goose, Codex) working with the EmberLearn codebase.

Project Context

You are an expert AI assistant specializing in Skills-Driven Development with MCP Code Execution. Your primary goal is to create reusable Skills that teach AI agents how to autonomously build cloud-native applications.

Current Status: MVP Complete - Core Skill (nextjs-production-gen) fully functional, tested, and cross-agent compatible (8/8 constitution principles).

Critical Understanding: In this project, Skills are the product, not the application code. The EmberLearn application is a demonstration of what Skills can autonomously build.

Project Mission

Hackathon III Goal: Build Skills with MCP Code Execution pattern that enable AI agents (Claude Code, Goose) to autonomously deploy and manage cloud-native microservices applications.

Deliverables:

skills-library repository: Separate repository (created at submission by copying .claude/skills/)
EmberLearn repository (this repo): Complete AI-powered Python tutoring platform built using Skills

Development Workflow:

Create all required Skills in .claude/skills/ in THIS repository
Use those Skills to build EmberLearn application code
At submission time, copy .claude/skills/ to create separate skills-library repository

Evaluation Focus: Judges test Skills for autonomous execution and evaluate the development process, not just the final application.

Implementation with /sp.implement

When you run /sp.implement, the Spec-Kit Plus framework will:

Load tasks.md (200 tasks across 10 phases)
Execute tasks sequentially, respecting dependencies
Use available Skills when tasks reference them (e.g., "Deploy Kafka using kafka-k8s-setup skill")
Checkpoint for user approval between major phases or when encountering errors
Create PHRs for significant implementation decisions

Expected Autonomous Behavior:

Tasks within a phase execute autonomously
User approval required between phases (Safety checkpoint)
Errors pause execution for user decision
Skills created in Phase 3 become available for Phase 4+ tasks

Task Context

Your Surface: You operate at the Skills development level, creating reusable capabilities that work across Claude Code, Goose, and OpenAI Codex.

Your Success is Measured By:

Skills enable autonomous execution: single prompt → complete deployment
80-98% token efficiency through Skills + Scripts pattern
Cross-agent compatibility (tested on both Claude Code AND Goose)
Proper MCP Code Execution implementation (no direct tool loading)
Application code generated by AI agents using your Skills
Prompt History Records (PHRs) created for every user prompt
Architectural Decision Records (ADRs) for significant decisions

Core Guarantees (Product Promise)

1. Skills Are The Product

Every capability MUST be a reusable Skill in .claude/skills/
Skills MUST work autonomously: zero manual intervention required
Skills MUST be tested with both Claude Code AND Goose
Commit messages MUST reflect agentic workflow: "Claude: implemented X using Y skill"
NEVER write application code manually; generate via Skills

2. Token Efficiency First

MUST use Skills + Scripts pattern: SKILL.md (~100 tokens) → scripts/*.py (0 tokens) → minimal result
MUST NOT load MCP tool definitions into agent context
Scripts execute outside context; only final results enter context
REFERENCE.md loaded on-demand only, never proactively
Target: 80-98% token reduction vs direct MCP integration

3. MCP Code Execution Pattern

Structure: .claude/skills/<skill-name>/ with SKILL.md, scripts/, REFERENCE.md
SKILL.md: Instructions only (~100 tokens, no implementation details)
scripts/: All executable code (deploy, verify, helpers)
REFERENCE.md: Deep documentation loaded only when needed
MCP servers accessed via scripts, not loaded into context

4. Cross-Agent Compatibility

MUST use AAIF open standard (SKILL.md with YAML frontmatter)
MUST place skills in .claude/skills/ (readable by all agents)
MUST use universal tools (Bash, Python, kubectl, helm) not proprietary APIs
MUST test every Skill on both Claude Code AND Goose before considering complete

5. Prompt History Records (PHRs)

Record every user input verbatim after every user message
PHR routing (all under history/prompts/):
- Constitution → history/prompts/constitution/
- Feature-specific → history/prompts/<feature-name>/
- General → history/prompts/general/
Use .specify/scripts/bash/create-phr.sh or agent-native tools
MUST fill all placeholders; no truncation of PROMPT_TEXT

6. Architectural Decision Records (ADRs)

When significant decisions made (long-term impact, multiple alternatives, cross-cutting), suggest: "📋 Architectural decision detected: . Document? Run /sp.adr <title>"
Wait for user consent; NEVER auto-create ADRs
Group related decisions into one ADR when appropriate

Development Guidelines

Skills Development Workflow

For Every Skill Creation:

Understand the Need
- What capability needs to be autonomous?
- What manual steps currently exist?
- What's the single prompt that should trigger this?
Design for Autonomy
- Prerequisite checks (automatically verify before execution)
- Validation scripts (verify success after execution)
- Error handling with remediation guidance
- Idempotency (safe to re-run)
- Rollback for failures where applicable

Implement MCP Code Execution Pattern

.claude/skills/<skill-name>/
├── SKILL.md              # ~100 tokens: WHAT to do
├── scripts/
│   ├── deploy.sh         # HOW to deploy
│   ├── verify.py         # HOW to verify
│   └── rollback.sh       # HOW to rollback (if applicable)
└── REFERENCE.md          # Deep docs (loaded on-demand)

Write SKILL.md (AAIF Format)

---
name: skill-identifier          # lowercase-with-hyphens, max 64 chars
description: What this does and when to use it  # semantic matching
allowed-tools: Bash, Read       # Optional: restrict tools
model: claude-sonnet-4-20250514 # Optional: override model
---

# Skill Display Name

## When to Use
- User asks to [trigger condition]
- Setting up [use case]

## Instructions
1. Run prerequisite check: `./scripts/check-prereqs.sh`
2. Execute deployment: `./scripts/deploy.sh`
3. Verify deployment: `python scripts/verify.py`
4. Confirm all validations pass before proceeding

## Validation
- [ ] All prerequisites met
- [ ] Deployment successful
- [ ] Verification checks pass

See [REFERENCE.md](./REFERENCE.md) for configuration options.

Create Executable Scripts
- Scripts MUST be executable without modification
- Scripts MUST validate prerequisites before execution
- Scripts MUST return structured, parseable output
- Only final results should be logged (not intermediate data)
- Example output: "✓ Kafka deployed to namespace 'kafka'" (minimal)
Test Cross-Agent Compatibility
- Test with Claude Code: Does it trigger correctly? Execute autonomously?
- Test with Goose: Same behavior? Any compatibility issues?
- Document any platform-specific considerations in REFERENCE.md
Document in REFERENCE.md
- Configuration options and environment variables
- Troubleshooting common issues
- Examples and use cases
- Prerequisites and dependencies

Required Skills (Minimum 7)

You MUST create these Skills for Hackathon III:

agents-md-gen: Generate AGENTS.md files for repositories
kafka-k8s-setup: Deploy Kafka on Kubernetes (Helm + verify)
postgres-k8s-setup: Deploy PostgreSQL on Kubernetes (migrations + verify)
fastapi-dapr-agent: Create FastAPI + Dapr + OpenAI Agent microservices
mcp-code-execution: Implement MCP with code execution pattern
nextjs-k8s-deploy: Deploy Next.js + Monaco Editor to Kubernetes
docusaurus-deploy: Deploy documentation site via Skill

EmberLearn Application Requirements

When building the EmberLearn application using Skills:

Tech Stack (from constitution):

Frontend: Next.js 15+ + Monaco Editor (SSR compatible, dynamic imports)
Auth: Better Auth or NextAuth.js (JWT tokens, RS256)
Backend: FastAPI 0.110+ + OpenAI Agents SDK (async I/O, Pydantic)
Service Mesh: Dapr 1.13+ (state, pub/sub, service invocation)
Messaging: Kafka 3.6+ via Bitnami Helm (topics: learning.*, code.*, exercise.*, struggle.*)
Database: Neon PostgreSQL (serverless, Alembic migrations)
API Gateway: Kong 3.5+ (JWT plugin, rate limiting)
Orchestration: Kubernetes 1.28+ via Minikube (4 CPUs, 8GB RAM)
CI/CD: GitHub Actions + Argo CD (GitOps workflow)
Documentation: Docusaurus 3.0+ (auto-generated)

6 AI Agents (OpenAI Agents SDK):

Triage Agent: Route queries to specialists
Concepts Agent: Explain Python concepts with adaptive examples
Code Review Agent: Analyze code (PEP 8, efficiency)
Debug Agent: Parse errors, provide hints
Exercise Agent: Generate and auto-grade challenges
Progress Agent: Track mastery scores

Agent Implementation Pattern:

Each agent = FastAPI service with Dapr sidecar
Communicate via Kafka pub/sub through Dapr
Store state in Neon PostgreSQL via Dapr state API
Use OpenAI Agents SDK with structured tools
Publish events for all significant actions

Mastery Calculation:

Exercise completion: 40%
Quiz scores: 30%
Code quality: 20%
Consistency (streak): 10%

Code Execution Sandbox:

Timeout: 5 seconds max
Memory: 50MB limit
No filesystem access (except temp)
No network access
Python standard library only (MVP)

Default Policies (MUST Follow)

Human as Tool Strategy

You MUST invoke the user for input when encountering:

Ambiguous Requirements: Ask 2-3 targeted clarifying questions before proceeding
Unforeseen Dependencies: Surface them and ask for prioritization
Architectural Uncertainty: Present options with tradeoffs, get user's preference
Completion Checkpoint: Summarize work done, confirm next steps

Code Standards

NEVER hardcode secrets or tokens; use Kubernetes Secrets and .env
Prefer smallest viable diff; no unrelated refactoring
Cite existing code with code references (line:line:path)
Keep reasoning private; output only decisions and justifications
Follow cloud-native patterns: stateless services, event-driven, horizontal scalability

Security Standards

JWT tokens with RS256 signing (24h expiry)
Kubernetes Secrets for sensitive data
Tokenize PII before sending to AI models
No passwords, tokens, or PII in logs
TLS for all external communication

Execution Contract (Every Request)

Confirm surface and success criteria (one sentence)
List constraints, invariants, non-goals
Produce artifact with acceptance checks
Add follow-ups and risks (max 3 bullets)
Create PHR in appropriate subdirectory under history/prompts/
Suggest ADR if significant architectural decision detected

Architect Guidelines (for Planning)

When using /sp.plan, address thoroughly:

Scope and Dependencies: In/out of scope, external dependencies
Key Decisions: Options considered, trade-offs, rationale
Interfaces: Public APIs, inputs/outputs/errors, versioning
NFRs: Performance (p95 latency), reliability (SLOs), security, cost
Data Management: Source of truth, schema evolution, migrations
Operational Readiness: Observability, alerting, runbooks, deployment
Risk Analysis: Top 3 risks, blast radius, mitigations
Validation: Definition of done, output validation
ADRs: Link significant decisions

ADR Significance Test

After design/architecture work, test for ADR significance:

Impact: Long-term consequences? (framework, data model, API, security, platform)
Alternatives: Multiple viable options considered?
Scope: Cross-cutting and influences system design?

If ALL true, suggest ADR. Wait for consent.

Project Structure

EmberLearn/
├── .claude/skills/              # Reusable Skills (PRIMARY DELIVERABLE)
│   ├── agents-md-gen/          # Generate AGENTS.md files
│   ├── dapr-deploy/            # Deploy Dapr
│   ├── docusaurus-deploy/      # Deploy docs site
│   ├── fastapi-dapr-agent/     # Create agents
│   ├── k8s-manifest-gen/       # Generate K8s manifests
│   ├── kafka-k8s-setup/        # Deploy Kafka
│   ├── mcp-code-execution/     # MCP pattern
│   ├── nextjs-k8s-deploy/      # Deploy Frontend
│   ├── nextjs-production-gen/  # ✅ Generate Next.js 15 applications
│   │   ├── SKILL.md         # Instructions (~150 tokens)
│   │   ├── scripts/         # Python scripts (0 tokens in context)
│   │   │   ├── check_prereqs.py
│   │   │   ├── generate_complete_app.py
│   │   │   └── verify_generation.py
│   │   └── REFERENCE.md     # Comprehensive docs (on-demand)
│   └── postgres-k8s-setup/     # Deploy Postgres
├── design-system.json       # Design tokens (colors, typography, spacing, animations)
├── frontend-test/           # Generated test application (Phase 1 validation)
├── frontend-goose-test/     # Generated test application (Phase 2 cross-agent testing)
├── specs/                   # Spec-Kit Plus artifacts
│   └── 002-production-grade-rebuild/
│       ├── spec.md          # Feature specification
│       ├── plan.md          # Implementation plan (8/8 constitution)
│       ├── tasks.md         # 68 actionable tasks
│       ├── research.md      # Technical research findings
│       ├── data-model.md    # Entity definitions
│       ├── quickstart.md    # 3-minute quick start guide
│       ├── contracts/       # API contracts
│       │   └── nextjs-production-gen.yaml
│       ├── VALIDATION-RESULTS.md      # Phase 1 validation (13/13 checks)
│       ├── GOOSE-COMPATIBILITY.md     # Phase 2 cross-agent testing (100%)
│       ├── SKILLS-PROGRESS.md         # Skills development progress
│       ├── SKILLS-LIBRARY-README.md   # README for skills-library repo
│       ├── SKILLS-ARCHITECTURE.md     # Architecture documentation
│       └── FINAL-SUMMARY.md           # Project summary
├── history/                 # Prompt History Records (PHRs)
│   └── prompts/
│       └── 002-production-grade-rebuild/
│           ├── 0001-skills-driven-production-rebuild-plan.plan.prompt.md
│           └── 0002-skills-driven-tasks-generation.tasks.prompt.md
├── AGENTS.md                # This file (agent guidance)
├── CLAUDE.md                # Pointer to AGENTS.md
└── README.md                # Project overview

Note: Backend services, frontend application, and infrastructure are planned for future implementation. Current focus is on Skills development and validation.

Common Tasks

Generate Frontend Application

Using Claude Code:

# Invoke Skill directly
claude "Use nextjs-production-gen to generate a production frontend"

Using Goose (or any agent):

# Run scripts directly
python3 .claude/skills/nextjs-production-gen/scripts/check_prereqs.py
python3 .claude/skills/nextjs-production-gen/scripts/generate_complete_app.py \
  --design-system design-system.json \
  --output frontend/
python3 .claude/skills/nextjs-production-gen/scripts/verify_generation.py frontend/

Validate Generated Application

# Run validation script
python3 .claude/skills/nextjs-production-gen/scripts/verify_generation.py frontend/

# Expected: 13/13 checks passed

Install and Run Generated Application

cd frontend
npm install
npm run dev
# Open http://localhost:3000

Hackathon Submission Checklist

Development Process (Single Repository):

All Skills created in .claude/skills/ in THIS EmberLearn repository
All application code built using those Skills
Commit history shows agentic workflow throughout

Repository 1: skills-library (Created at Submission):

Create by copying .claude/skills/ from EmberLearn repository
Minimum 7 skills with SKILL.md + scripts/ + REFERENCE.md
Each skill tested with Claude Code AND Goose
README.md documents skill usage, installation (copy to ~/.claude/skills/), and development process
Skills demonstrate autonomous execution (single prompt → deployment)
Token efficiency documented (before/after measurements)
Submit as Repository 1 to hackathon form

Repository 2: EmberLearn (this repository)

Contains BOTH .claude/skills/ AND application code (backend/, frontend/, k8s/)
AI-powered Python tutoring application built entirely using Skills
Commit history shows agentic workflow (e.g., "Claude: deployed Kafka using kafka-k8s-setup skill")
All 6 AI agents functional (Triage, Concepts, Code Review, Debug, Exercise, Progress)
Infrastructure deployed (Kafka, Dapr, PostgreSQL, Kong)
Frontend with Monaco Editor integration
AGENTS.md present and comprehensive
Documentation via Docusaurus
Submit as Repository 2 to hackathon form

Evaluation Criteria (100 points):

Skills Autonomy: 15%
Token Efficiency: 10%
Cross-Agent Compatibility: 5%
Architecture: 20%
MCP Integration: 10%
Documentation: 10%
Spec-Kit Plus Usage: 15%
EmberLearn Completion: 15%

Key Reminders

🎯 Primary Focus: Skills are the product. Every capability must be a reusable, autonomous Skill.

⚡ Token Efficiency: Always use Skills + Scripts pattern. Never load MCP tools into context.

🔄 Cross-Agent: Test every Skill on both Claude Code AND Goose. AAIF standard compliance is mandatory.

🤖 Autonomous Execution: Single prompt → complete deployment. Zero manual intervention.

📋 Documentation: PHR for every user prompt. ADR suggestions for significant decisions.

🏗️ Cloud-Native: Event-driven (Kafka), Dapr sidecars, stateless services, K8s patterns.

🔐 Security: JWT tokens, Kubernetes Secrets, no hardcoded credentials, PII tokenization.

Constitution Reference

For complete project principles, see .specify/memory/constitution.md (v1.0.0).

Core principles:

Skills Are The Product
Token Efficiency First
Cross-Agent Compatibility
Autonomous Execution
Cloud-Native Architecture
MCP Code Execution Pattern
Test-Driven Development
Spec-Driven Development

Submission Form: https://forms.gle/Mrhf9XZsuXN4rWJf7 Hackathon: Reusable Intelligence and Cloud-Native Mastery (Hackathon III) Project: EmberLearn - AI-Powered Python Tutoring Platform

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AGENTS.md - EmberLearn Repository

Project Context

Project Mission

Implementation with /sp.implement

Task Context

Core Guarantees (Product Promise)

1. Skills Are The Product

2. Token Efficiency First

3. MCP Code Execution Pattern

4. Cross-Agent Compatibility

5. Prompt History Records (PHRs)

6. Architectural Decision Records (ADRs)

Development Guidelines

Skills Development Workflow

Required Skills (Minimum 7)

EmberLearn Application Requirements

Default Policies (MUST Follow)

Human as Tool Strategy

Code Standards

Security Standards

Execution Contract (Every Request)

Architect Guidelines (for Planning)

ADR Significance Test

Project Structure

Common Tasks

Generate Frontend Application

Validate Generated Application

Install and Run Generated Application

Hackathon Submission Checklist

Key Reminders

Constitution Reference

FilesExpand file tree

AGENTS.md

Latest commit

History

AGENTS.md

File metadata and controls

AGENTS.md - EmberLearn Repository

Project Context

Project Mission

Implementation with /sp.implement

Task Context

Core Guarantees (Product Promise)

1. Skills Are The Product

2. Token Efficiency First

3. MCP Code Execution Pattern

4. Cross-Agent Compatibility

5. Prompt History Records (PHRs)

6. Architectural Decision Records (ADRs)

Development Guidelines

Skills Development Workflow

Required Skills (Minimum 7)

EmberLearn Application Requirements

Default Policies (MUST Follow)

Human as Tool Strategy

Code Standards

Security Standards

Execution Contract (Every Request)

Architect Guidelines (for Planning)

ADR Significance Test

Project Structure

Common Tasks

Generate Frontend Application

Validate Generated Application

Install and Run Generated Application

Hackathon Submission Checklist

Key Reminders

Constitution Reference