Commit f55dd75
feat: unified model pipeline with decoupled tool calling (#49)
Implements the unified inference pipeline for SKaiNET Transformers,
resolving #49 and building on the tool calling foundation from #46.
## Summary
This branch decouples tool calling from the kllama runner, creates a
unified model pipeline with architecture auto-detection, and adds
comprehensive Antora documentation.
### Phase 1: Decouple Tool Calling
- Enhance Tokenizer interface with eosTokenId, bosTokenId, vocabSize
- Create ChatSession abstraction in llm-agent (any runner gets tool
calling for free)
- Refactor ToolCallingDemo and AgentCli to accept Tokenizer, not
GGUFTokenizer
- Fix JavaAgentLoop instanceof hack
### Phase 2: Model Registry
- Add ModelFamily enum (LLAMA, QWEN, GEMMA, APERTUS, BERT, VOXTRAL)
- Add ModelRegistry.detect() for GGUF architecture auto-detection
- Add UnifiedModelLoader.peek() to extract model info without loading
### Phase 3: Tokenization Pipeline
- Move GGUFTokenizer from kllama to llm-core (all runners can use it)
- Create TokenizerFactory with fromGGUF(), fromTokenizerJson(),
fromHuggingFace()
### Phase 4: Unified CLI
- New skainet-cli module: single entry point for all GGUF models
- Auto-detects architecture, supports --chat/--agent/--demo modes
### Smoke Tests
- Add tool calling test phase with [Tool Call] detection
- Add ToolCallingDemo.runSingleShot() for non-interactive testing
- Add Qwen3-8B-Q4 to smoke test config
### Documentation (Antora + Divio)
- 19 AsciiDoc pages: tutorials, how-to, reference, explanation
- Mermaid diagrams via Kroki for pipeline, architecture, agent loop
- GitHub Actions workflow for docs build and GitHub Pages deployment
Refs: #46
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent f8caaf8 commit f55dd75
0 file changed
0 commit comments