Combining ChatGPT's Safety Framework + Gemini's Implementation Strategy
Transform StillMe into a "Living Codebase" where it understands its own source code, can explain it, generate tests, and assist developers - while maintaining strict safety boundaries.
- ✅ Code Q&A (explain code)
- ✅ Generate unit tests
- ✅ Code review suggestions
- ✅ Onboarding mentor
- ✅ Documentation generation
Boundary: Read-only, explain-only, suggest-only. NO code modification.
⚠️ Deep code analysis⚠️ Architectural suggestions⚠️ Edge case detection
Boundary: All suggestions must be reviewed by humans. Tests must be verified.
- ❌ Auto-modify source code
- ❌ Auto-fix production bugs
- ❌ Self-improvement loop without human approval
- ❌ Auto-merge decisions
Boundary: These features will NOT be implemented in this project.
Goal: StillMe can answer questions about its own codebase.
Tasks:
-
Setup Codebase Indexing Infrastructure
- Create
backend/services/codebase_indexer.py - Integrate with ChromaDB (create
stillme_codebasecollection) - Implement code chunking: by file, by class, by function
- Create
-
Code Embedding & Storage
- Use existing embedding model (paraphrase-multilingual-MiniLM-L12-v2)
- Index all Python files in
backend/,stillme_core/,frontend/ - Store metadata: file_path, line_range, function_name, class_name, docstring
-
Code Q&A API Endpoint
- Create
/api/codebase/queryendpoint - Accept questions: "How does citation_validator work?"
- Retrieve relevant code chunks using RAG
- Generate explanations with code citations (file:line)
- Create
-
Code Explanation Prompt Engineering
- Add code explanation prompt in
backend/identity/prompt_builder.py - Safety: "Explain code, do not modify or suggest modifications"
- Support Vietnamese and English
- Add code explanation prompt in
Testing:
- Test with 10+ questions about different parts of codebase
- Verify accuracy: explanations match actual code
- Verify citations: file:line references are correct
- Measure performance: response time, token usage
Success Criteria:
- ✅ StillMe can explain any function/class in codebase
- ✅ Citations are accurate (file:line)
- ✅ No hallucinations about code structure
- ✅ Response time < 5 seconds
README Update:
- Add "StillMe Codebase Assistant" section
- Document Phase 1 capabilities
- Include API endpoint documentation
- Examples of questions StillMe can answer
Goal: StillMe can generate tests and review code (suggestions only, no auto-fix).
Tasks:
-
Test Generation Service
- Create
backend/services/test_generator.py - Accept code file/content as input
- Generate unit tests using LLM with code context
- Support pytest format
- Include: happy path, edge cases, error handling
- Create
-
Code Review Assistant
- Create
backend/services/code_reviewer.py - Analyze code for:
- Unused imports
- Unreachable code
- Missing error handling
- Naming inconsistencies
- Potential bugs
- Generate review comments with suggestions
- Safety: Review only, no auto-fix
- Create
-
API Endpoints
/api/codebase/generate-tests- Generate test file/api/codebase/review- Review code and return suggestions
Testing:
- Generate tests for 5 different validators
- Verify test quality: tests actually run, cover main logic
- Review 10 code snippets, verify accuracy
- Measure false positive/negative rates
Success Criteria:
- ✅ Generated tests run successfully
- ✅ Test coverage > 70% for generated tests
- ✅ Code review catches real issues (low false positives)
- ✅ All suggestions are actionable
README Update:
- Update with Phase 2 capabilities
- Document safety boundaries (review only, no auto-fix)
- Examples of generated tests and review comments
Goal: StillMe becomes a "living documentation" with Git history and architecture understanding.
Tasks:
-
Git History Integration
- Create
backend/services/git_history_retriever.py - Index commit messages, PR descriptions, issue discussions
- Store in
git_historyChromaDB collection - Support queries: "Why did we choose Redis for caching?"
- Create
-
Architecture Understanding
- Enhance codebase_indexer to understand module dependencies
- Create dependency graph (optional, for docs)
- Support architecture queries: "How does validation chain work?"
-
Onboarding Mentor Mode
- Create
/api/codebase/onboardingendpoint - Generate personalized onboarding guide
- Suggest starting points, important files, first issues
- Include code examples and explanations
- Create
Testing:
- Test Git history queries: Answer "why" questions
- Test architecture queries: Explain complex interactions
- Test onboarding: Generate guides for 3 contributor profiles
- Verify accuracy and usefulness
Success Criteria:
- ✅ StillMe can answer "why" questions using Git history
- ✅ Architecture explanations are accurate
- ✅ Onboarding guides are helpful for new contributors
- ✅ All features remain in Safe/Controlled zones
README Update:
- Update with Phase 3 capabilities
- Document full feature set
- Add "Living Codebase" section explaining vision
- Include safety boundaries and limitations
# Chunking Strategy:
1. By File: Each file = 1 chunk (for small files)
2. By Class: Each class = 1 chunk (for medium files)
3. By Function: Each function = 1 chunk (for large files)
4. Max chunk size: 1000 tokens{
"file_path": "backend/validators/citation_validator.py",
"line_range": "45-78",
"function_name": "validate_citation",
"class_name": "CitationValidator",
"docstring": "...",
"code_type": "function", # or "class", "file"
"dependencies": ["..."] # imports, used classes
}User Question → Embed Query → Search ChromaDB → Retrieve Top-K Chunks →
Build Context → LLM Generate Response → Return with Citations
- ✅ Explain code
- ✅ Generate tests (user reviews before using)
- ✅ Suggest improvements (user decides)
- ✅ Review code (user fixes)
- ❌ Auto-modify source code
- ❌ Auto-commit changes
- ❌ Bypass human review
- ❌ Make architectural decisions alone
- Accuracy: > 90% (explanations match code)
- Citation accuracy: > 95% (file:line correct)
- Response time: < 5 seconds
- Test generation success: > 80% (tests run)
- Test coverage: > 70% for generated tests
- Code review accuracy: > 75% (catches real issues)
- Git history query accuracy: > 85%
- Architecture explanation accuracy: > 90%
- Onboarding guide usefulness: > 80% (user satisfaction)
- ChromaDB already set up
- Embedding service already initialized
- LLM API keys configured
- Follow TODO list items:
codebase-assistant-phase1-* - Test thoroughly
- Update README
- Commit:
feat: Phase 1 - Codebase Q&A Assistant
- Follow TODO list items:
codebase-assistant-phase2-* - Test thoroughly
- Update README
- Commit:
feat: Phase 2 - Test Generator & Code Review
- Follow TODO list items:
codebase-assistant-phase3-* - Test thoroughly
- Update README
- Commit:
feat: Phase 3 - Complete Codebase Assistant
- Always test before committing
- Always update README after each phase
- Maintain safety boundaries strictly
- Measure metrics to track progress
- Iterate based on feedback
StillMe becomes a "Living Codebase" where:
- Developers can chat with StillMe about code
- StillMe understands its own architecture
- StillMe helps onboard new contributors
- StillMe generates tests and reviews code
- StillMe remembers design decisions (via Git history)
But always with human oversight and safety boundaries.