Minimalist tool for deep codebase analysis via Telegram bot using RAG (Retrieval-Augmented Generation) with multiple LLM providers. Follows KISS principles and Occam's razor for maximum simplicity and development efficiency.
ALWAYS RESPOND IN ENGLISH
- Do what has been asked; nothing more, nothing less.
- Before you finish, please verify your solution.
- NEVER create files unless they're absolutely necessary for achieving your goal.
- ALWAYS prefer editing an existing file to creating a new one.
- NEVER proactively create documentation files (*.md) or README files.
- Node.js 18+ - runtime environment
- grammY ^1.37.0 - modern Telegram Bot framework (TypeScript-first)
- TypeScript ^5.9.2 - static typing for reliability
- Zod ^4.0.15 - runtime validation and type-safe schemas
- LLM Providers - OpenAI, Gemini, Anthropic, Perplexity, Jina for embeddings and completions
- dotenv ^17.2.1 - environment variables management
- tsx ^4.20.3 - TypeScript execution for development
"As simple as possible, but not simpler"
- KISS + Occam's Razor: every new entity must justify its existence
- Pragmatism: working solution is more important than "correct" architecture
- Minimalism: only what is actually needed
- File system first: avoid databases unless absolutely necessary
- RAG Pipeline - semantic code search with LLM reranking
- Multi-LLM Support - OpenAI, Gemini, Anthropic, Perplexity, Jina
- Telegram Interface - natural communication with the bot
- Simple Authorization - whitelist access system
- Structured Responses - brief summary + detailed .md file
telegram-code-analyzer/
├── 📄 .env # 🔐 Configuration (tokens, users)
├── 📄 package.json # 📦 Dependencies (grammy, dotenv, tsx)
├── 📄 tsconfig.json # ⚙️ TypeScript configuration
├── 📂 src/
│ ├── 📄 index.ts # 🚀 Application entry point
│ ├── 📄 bot.ts # 🤖 Telegram bot + handlers
│ ├── 📄 auth.ts # 🔐 Whitelist authorization
│ ├── 📄 utils.ts # 🛠️ Utilities (logging, config)
│ ├── 📄 validation.ts # 🔒 Input validation & security
│ ├── 📄 types.ts # 🏷️ TypeScript types
│ ├── 📂 rag/ # 🔍 RAG система
│ │ ├── 📄 parser.ts # AST парсер TypeScript
│ │ ├── 📄 chunker.ts # Семантическое разбиение
│ │ ├── 📄 store.ts # Векторное хранилище
│ │ ├── 📄 retriever.ts # Поиск + ранжирование
│ │ ├── 📄 pipeline.ts # Оркестратор
│ │ └── 📄 embedding-cache.ts # LRU кеш для embeddings
│ ├── 📂 llm/ # 🤖 LLM провайдеры
│ │ ├── 📄 *.ts # OpenAI, Gemini, Anthropic, Perplexity, Jina
│ │ ├── 📄 retry.ts # Retry with exponential backoff
│ │ └── 📄 fallback.ts # Provider fallback chain
│ ├── 📂 errors/ # ❌ Error handling
│ │ ├── 📄 index.ts # Error handling & messages
│ │ └── 📄 types.ts # Error type definitions
│ └── 📂 __tests__/ # 🧪 Integration tests
│ ├── 📄 setup.ts # Test configuration
│ ├── 📄 bot.integration.test.ts # Bot tests
│ └── 📄 integration.test.ts # Integration tests
└── 📂 temp/ # 🗂️ Temporary .md responses
📖 Detailed Architecture: Complete component structure in PROJECT_STRUCTURE.md
Все запросы обрабатываются через RAG:
/ask <вопрос> → Embedding → Vector Search → LLM Reranking → Answer
| Компонент | Файл | Назначение |
|---|---|---|
| Parser | rag/parser.ts |
AST парсинг TypeScript |
| Chunker | rag/chunker.ts |
Разбиение на чанки |
| Store | rag/store.ts |
Векторное хранилище |
| Retriever | rag/retriever.ts |
Поиск + ранжирование (batch size: 5) |
| Pipeline | rag/pipeline.ts |
Оркестратор (reranking timeout: 90s) |
Команда /index поддерживает инкрементальное индексирование:
- По умолчанию переиндексируются только изменённые файлы
- Обнаружение изменений через SHA256 хеши и mtime
/index --full— принудительное полное переиндексирование
| Provider | Embeddings | Completions |
|---|---|---|
| OpenAI | ✓ | ✓ |
| Gemini | ✓ | ✓ |
| Jina | ✓ | ✗ |
| Anthropic | ✗ | ✓ |
| Perplexity | ✗ | ✓ |
retryWithBackoff<T>(fn, options)— exponential backoff for all providers- Options:
maxRetries(3),baseDelayMs(1000),maxDelayMs(30000),signal,onRetry - Retries on: 429, 500/502/503/504, timeouts, network errors
- Options:
CompletionProviderWithFallback— tries providers in order until one succeeds- Factory:
createFallbackProvider([provider1, provider2, ...])
- Factory:
- CLI Fallback — Claude Code CLI (haiku) used as primary provider when available
- Falls back to configured API provider (Perplexity, OpenAI, etc.)
- LRU cache for query embeddings (
maxSize: 1000) - Single-flight pattern prevents duplicate API calls
getStats()returns{ size, hits, misses, hitRate }
Stop and verify at these moments:
- After implementing a complete function
- Before starting a new component
- When something seems wrong
- Before declaring "done"
Run verification: npm run build && npm run type-check && npm run dev
Why: This prevents error accumulation and ensures code stability.
- NO any type - always use specific TypeScript types!
- NO hardcoded values - use constants and config!
- NO code duplication - use functions and utilities!
- NO ignoring errors - always handle exceptions!
- NO TODOs in final code
- NO exec() calls - use spawn() for security!
- NO unvalidated input - always use Zod schemas!
- NO backwards-compatibility hacks - don't rename unused
_vars, re-export types, or add// removedcomments
- Type Guards instead of any type assertions - create isType() functions
- Zod schemas for all external data validation
- Custom error classes extending Error with proper typing
- Pure functions for testability and modularity
- Async/await instead of promises where possible
- Interfaces for all API contracts and configuration
- Meaningful names with predicates (isAuthorized, hasAccess)
- Early returns to reduce nesting
- Typed configurations - no process.env without validation
- Don't add features, refactor code, or make "improvements" beyond what was asked
- Don't add error handling for scenarios that can't happen
- Don't create helpers or abstractions for one-time operations
- Don't design for hypothetical future requirements
- Three similar lines of code is better than a premature abstraction
- ✓ npm run build compiles without errors
- ✓ TypeScript compiles without errors and warnings
- ✓ All Zod schemas validate correctly
- ✓ Error handling follows custom error class pattern
- ✓ No spawn/exec security vulnerabilities
- ✓ Function works end-to-end through Telegram
- ✓ Graceful error handling implemented
- ✓ Code is clear and simple to understand
- ✓ Old/unused code removed
- ✓ Complexity stayed same or reduced (where possible)
- ✓ Code is understandable by junior developer
When stuck or confused:
- Stop - Don't complicate the solution
- Step back - Re-read requirements in PRD
- Simplify - Simple solution is usually correct
- Ask - "I see two approaches: [A] vs [B]. Which is preferable?"
Your improvement ideas are welcome - ask away!
- Whitelist authorization via Telegram ID
- Input sanitization before processing
- Timeout limits for all operations
- Never log sensitive data (tokens, user IDs)
npm run build- TypeScript compilationnpm run dev- Development mode with tsxnpm start- Production startnpm run type-check- TypeScript type checkingnpm run lint- Code formatting check with Prettiernpm run lint:fix- Auto-fix code formattingnpm run test- Run tests in watch modenpm run test:run- Run tests once (CI mode)
One message, multiple tools:
- Multiple Edit tools → One message → All parallel
- Parallel Read → Multiple files simultaneously
- Batch independent operations together
rg -n "pattern" --glob '!node_modules/*' # Pattern search
fd filename # File finding
tree -L 2 src/ # Project structure- Semantic search: vector embeddings + LLM reranking
- Multi-provider: OpenAI, Gemini, Jina for embeddings; Anthropic, Perplexity for completions
- AST parsing: TypeScript-aware code chunking
- grammY framework: modern TypeScript-first approach
- Simple middleware: only authorization and error handling
- File delivery: sending detailed .md analyses as documents
- Auto-text handling: users can send questions directly without /ask
- Progress animation: 3-stage progress indicator during query processing
- InlineKeyboard: buttons in /start command for quick navigation
- Graceful shutdown: SIGINT/SIGTERM handling, waits for indexing to complete
- IndexingLock: atomic lock prevents concurrent indexing (TOCTOU fix)
- All messages in English: user-facing messages are in English
- File system: analysis results saved to temp/
- Vector store: JSON persistence for RAG index
- Environment config: all configuration via .env
- No database: avoiding database complexity