Skip to content

Commit f55dd75

Browse files
michalharakalclaude
andcommitted
feat: unified model pipeline with decoupled tool calling (#49)
Implements the unified inference pipeline for SKaiNET Transformers, resolving #49 and building on the tool calling foundation from #46. ## Summary This branch decouples tool calling from the kllama runner, creates a unified model pipeline with architecture auto-detection, and adds comprehensive Antora documentation. ### Phase 1: Decouple Tool Calling - Enhance Tokenizer interface with eosTokenId, bosTokenId, vocabSize - Create ChatSession abstraction in llm-agent (any runner gets tool calling for free) - Refactor ToolCallingDemo and AgentCli to accept Tokenizer, not GGUFTokenizer - Fix JavaAgentLoop instanceof hack ### Phase 2: Model Registry - Add ModelFamily enum (LLAMA, QWEN, GEMMA, APERTUS, BERT, VOXTRAL) - Add ModelRegistry.detect() for GGUF architecture auto-detection - Add UnifiedModelLoader.peek() to extract model info without loading ### Phase 3: Tokenization Pipeline - Move GGUFTokenizer from kllama to llm-core (all runners can use it) - Create TokenizerFactory with fromGGUF(), fromTokenizerJson(), fromHuggingFace() ### Phase 4: Unified CLI - New skainet-cli module: single entry point for all GGUF models - Auto-detects architecture, supports --chat/--agent/--demo modes ### Smoke Tests - Add tool calling test phase with [Tool Call] detection - Add ToolCallingDemo.runSingleShot() for non-interactive testing - Add Qwen3-8B-Q4 to smoke test config ### Documentation (Antora + Divio) - 19 AsciiDoc pages: tutorials, how-to, reference, explanation - Mermaid diagrams via Kroki for pipeline, architecture, agent loop - GitHub Actions workflow for docs build and GitHub Pages deployment Refs: #46 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent f8caaf8 commit f55dd75

0 file changed

File tree

    0 commit comments

    Comments
     (0)