Skip to content

Commit f9110fb

Browse files
Merge pull request #50 from SKaiNET-developers/feature/49-tool-calling-pipeline
Feature/49 tool calling pipeline
2 parents fa7ab73 + a448468 commit f9110fb

46 files changed

Lines changed: 3431 additions & 824 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/docs.yml

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
name: Docs
2+
3+
on:
4+
push:
5+
branches: [ main, develop ]
6+
paths:
7+
- 'docs/**'
8+
- '.github/workflows/docs.yml'
9+
pull_request:
10+
paths:
11+
- 'docs/**'
12+
- '.github/workflows/docs.yml'
13+
workflow_dispatch:
14+
15+
concurrency:
16+
group: docs-${{ github.ref }}
17+
cancel-in-progress: true
18+
19+
permissions:
20+
contents: read
21+
pages: write
22+
id-token: write
23+
24+
jobs:
25+
build-docs:
26+
runs-on: ubuntu-latest
27+
timeout-minutes: 15
28+
29+
steps:
30+
- name: Checkout
31+
uses: actions/checkout@v6
32+
33+
- name: Build custom Antora image
34+
run: |
35+
docker build \
36+
-t skainet-antora:local \
37+
-f docs/.docker/Dockerfile \
38+
docs/.docker/
39+
40+
- name: Build Antora site
41+
run: |
42+
docker run --rm \
43+
-v "${{ github.workspace }}:/antora" \
44+
--workdir /antora/docs \
45+
skainet-antora:local \
46+
--stacktrace \
47+
antora-playbook.yml
48+
49+
- name: Upload artifact
50+
uses: actions/upload-pages-artifact@v3
51+
with:
52+
path: docs/build/site
53+
54+
deploy-docs:
55+
if: github.ref == 'refs/heads/develop' && github.event_name == 'push'
56+
needs: build-docs
57+
runs-on: ubuntu-latest
58+
environment:
59+
name: github-pages
60+
url: ${{ steps.deployment.outputs.page_url }}
61+
62+
steps:
63+
- name: Deploy to GitHub Pages
64+
id: deployment
65+
uses: actions/deploy-pages@v4

PLAN-unified-pipeline.md

Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
# Plan: Unified Model Pipeline with Decoupled Tool Calling
2+
3+
## Context
4+
5+
Currently SKaiNET-transformers has:
6+
- **5+ hand-coded runtimes** (LlamaRuntime, Qwen35Runtime, Gemma3nRuntime, ApertusRuntime, VoxtralRuntimes) — each reimplements the forward pass, weight loading, and layer execution
7+
- **Tool calling tightly coupled to kllama** — the AgentLoop, ToolCallingDemo, and chat modes only exist in the kllama runner. Other models (Gemma, Apertus) cannot use tool calling without duplicating code
8+
- **Two execution paths** — legacy hand-coded runtimes AND the newer `OptimizedLLMRuntime` with DSL/compute-graph/AOT. LlamaRuntime and ApertusRuntime are already marked deprecated
9+
10+
The goal: converge on **one unified pipeline** where model definition, weight loading, tokenization, and tool calling are cleanly separated pipeline stages.
11+
12+
## Architecture Overview
13+
14+
```
15+
GGUF/SafeTensors File
16+
|
17+
WeightLoader (parse metadata + tensors)
18+
|
19+
DSL Network Definition (model-specific, declarative)
20+
|
21+
ComputeGraph (DAG)
22+
|
23+
Optimization Pipeline (TransposeElim -> WeightDedup -> LLMFusion -> DCE)
24+
|
25+
ComputeGraphExecutor (fused kernels)
26+
|
27+
InferenceRuntime (unified: forward + generate)
28+
|
29+
TokenizationPipeline (encode/decode, special tokens, byte-level BPE)
30+
|
31+
ChatPipeline (template formatting, tool calling, agent loop)
32+
```
33+
34+
## Phase 1: Decouple Tool Calling from kllama (immediate value) -- DONE
35+
36+
**What was done:**
37+
38+
1. **Enhanced `Tokenizer` interface** with `eosTokenId`, `bosTokenId`, `vocabSize`
39+
- Updated all implementations: `GGUFTokenizer`, `TokenizerImpl`, `HuggingFaceBPETokenizer`, `TekkenTokenizerAdapter`, `HuggingFaceTokenizer` (BERT)
40+
41+
2. **Created `ChatSession` abstraction** in `llm-agent`
42+
- File: `llm-agent/.../chat/ChatSession.kt`
43+
- Bundles `InferenceRuntime` + `Tokenizer` + `ModelMetadata`
44+
- Provides `createAgentLoop()` and `runSingleTurn()` for any runner
45+
46+
3. **Refactored `ToolCallingDemo` and `AgentCli`** to use `Tokenizer` interface instead of `GGUFTokenizer`
47+
- Both now accept any `Tokenizer`, not just `GGUFTokenizer`
48+
- Both use `ChatSession` internally for agent loop creation
49+
50+
4. **Removed `GGUFTokenizer` cast from kllama Main.kt** dispatch
51+
- Chat/agent/demo modes now work with any `Tokenizer`
52+
53+
5. **Fixed `JavaAgentLoop`** — replaced `GGUFTokenizer` instanceof hack with `tokenizer.eosTokenId`
54+
55+
## Phase 2: Unified DSL-Based Model Definition (converge on OptimizedLLMRuntime) -- PARTIAL
56+
57+
**What was done:**
58+
59+
1. **Created `ModelRegistry`** in `llm-core/.../ModelRegistry.kt`
60+
- `ModelFamily` enum: LLAMA, QWEN, GEMMA, APERTUS, BERT, VOXTRAL, UNKNOWN
61+
- `ModelRegistry.detect(architecture)` maps GGUF arch strings to families
62+
- Tracks capabilities (supportsToolCalling, chatTemplateFamily)
63+
64+
2. **Created `UnifiedModelLoader`** in `llm-core/.../UnifiedModelLoader.kt`
65+
- `UnifiedModelLoader.peek(source)` extracts `GGUFModelInfo` from GGUF metadata
66+
- Returns architecture, family, dimensions without loading weights
67+
68+
**Already existing (no changes needed):**
69+
- DSL networks: `llamaNetwork()`, `qwenNetwork()`, `apertusNetwork()`, `bertNetwork()`, `voxtralBackboneNetwork()`, `voxtralAcousticNetwork()`
70+
- `OptimizedLLMRuntime` with DIRECT/OPTIMIZED/HYBRID modes
71+
- Per-model `NetworkLoader` classes (LlamaNetworkLoader, ApertusNetworkLoader, etc.)
72+
73+
**Remaining (future work):**
74+
- `gemmaNetwork()` DSL definition (Gemma3n has unique features: GELU, MatFormer variable FFN, sliding window)
75+
- Migrate CLI runners from deprecated runtimes to OptimizedLLMRuntime
76+
- Remove deprecated LlamaRuntime and ApertusRuntime
77+
78+
## Phase 3: Tokenization as Pipeline Stage -- DONE
79+
80+
**What was done:**
81+
82+
1. **Enhanced `Tokenizer` interface** with `eosTokenId`, `bosTokenId`, `vocabSize` (done in Phase 1)
83+
84+
2. **Moved `GGUFTokenizer` from kllama to `llm-core`**
85+
- New location: `llm-core/.../tokenizer/GGUFTokenizer.kt`
86+
- Old location has a typealias for backwards compatibility
87+
- Added `skainet-io-gguf` and `kotlinx-io-core` dependencies to `llm-core`
88+
89+
3. **Created `TokenizerFactory`** in `llm-core/.../tokenizer/TokenizerFactory.kt`
90+
- `TokenizerFactory.fromGGUF(source)` — from GGUF file metadata
91+
- `TokenizerFactory.fromTokenizerJson(json)` — from HuggingFace tokenizer.json
92+
- `TokenizerFactory.fromHuggingFace(json, config)` — full HF BPE tokenizer
93+
94+
4. All runners can now use `GGUFTokenizer` and `TokenizerFactory` directly from `llm-core`
95+
96+
## Phase 4: Unified Runner (single CLI entry point) -- DONE
97+
98+
**What was done:**
99+
100+
1. **Created `llm-apps/skainet-cli`** — new unified CLI module
101+
- Auto-detects architecture from GGUF metadata via `UnifiedModelLoader.peek()`
102+
- Loads any LLaMA-compatible model (LLaMA, Qwen, Mistral)
103+
- Supports `--chat`, `--agent`, `--demo` modes with tool calling
104+
- Uses `TokenizerFactory.fromGGUF()` for tokenizer loading
105+
- Registered as `skainet` runner in smoke test script
106+
107+
2. **Usage:**
108+
```bash
109+
skainet -m model.gguf "The capital of France is" # auto-detect, generate
110+
skainet -m model.gguf --chat # interactive chat
111+
skainet -m model.gguf --demo "What is 2+2?" # tool calling demo
112+
```
113+
114+
3. **Existing per-model CLIs are preserved** — no breaking changes
115+
116+
**Remaining (future work):**
117+
- Add Gemma3n loading path to unified CLI (requires gemmaNetwork() DSL)
118+
- Add Apertus loading path to unified CLI
119+
- Eventually deprecate per-model CLIs
120+
121+
## All Phases Complete
122+
123+
| Phase | Status | Summary |
124+
|-------|--------|---------|
125+
| 1. Decouple tool calling | DONE | ChatSession, Tokenizer interface, no GGUFTokenizer coupling |
126+
| 2. Model registry | DONE | ModelRegistry, UnifiedModelLoader, ModelFamily enum |
127+
| 3. Tokenization pipeline | DONE | GGUFTokenizer in llm-core, TokenizerFactory |
128+
| 4. Unified runner | DONE | skainet-cli with auto-detection |
129+
3. **Phase 2** then — biggest refactor, needs per-model validation
130+
4. **Phase 4** last — depends on all other phases

docs/.docker/.dockerignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
node_modules
2+
build

docs/.docker/Dockerfile

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
FROM node:20-alpine
2+
3+
LABEL org.opencontainers.image.title="SKaiNET Antora" \
4+
org.opencontainers.image.description="Antora site generator with built-in Mermaid rendering" \
5+
org.opencontainers.image.source="https://github.com/SKaiNET-developers/SKaiNET-transformers"
6+
7+
# Chromium for mermaid-cli (puppeteer)
8+
RUN apk add --no-cache chromium font-noto
9+
10+
ENV PUPPETEER_EXECUTABLE_PATH=/usr/bin/chromium-browser \
11+
PUPPETEER_SKIP_DOWNLOAD=true
12+
13+
WORKDIR /antora
14+
15+
# Install Antora + extensions + mermaid-cli in one layer
16+
RUN npm i --save-exact \
17+
@antora/cli@3.1 \
18+
@antora/site-generator@3.1 \
19+
asciidoctor-kroki@0.18 \
20+
@mermaid-js/mermaid-cli@11 \
21+
&& npm cache clean --force
22+
23+
# Mermaid-cli config: use installed Chromium, no sandbox (container)
24+
RUN echo '{ \
25+
"executablePath": "/usr/bin/chromium-browser", \
26+
"args": ["--no-sandbox", "--disable-gpu", "--disable-dev-shm-usage"] \
27+
}' > /antora/puppeteer-config.json
28+
29+
# Pre-generate a simple diagram to warm up and verify the stack works
30+
RUN echo 'graph TD; A-->B;' > /tmp/test.mmd \
31+
&& npx mmdc -i /tmp/test.mmd -o /tmp/test.svg -p /antora/puppeteer-config.json \
32+
&& rm /tmp/test.mmd /tmp/test.svg
33+
34+
ENTRYPOINT ["npx", "antora"]
35+
CMD ["--stacktrace", "antora-playbook.yml"]

docs/antora-playbook.yml

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
site:
2+
title: SKaiNET Transformers
3+
start_page: skainet-transformers::index.adoc
4+
5+
content:
6+
sources:
7+
- url: .
8+
start_path: docs
9+
branches: HEAD
10+
11+
asciidoc:
12+
extensions:
13+
- asciidoctor-kroki
14+
attributes:
15+
# Use local mermaid-cli via Kroki (no external server needed when
16+
# built with the custom Docker image in docs/.docker/Dockerfile)
17+
kroki-fetch-diagram: true
18+
19+
ui:
20+
bundle:
21+
url: https://gitlab.com/antora/antora-ui-default/-/jobs/artifacts/HEAD/raw/build/ui-bundle.zip?job=bundle-stable
22+
snapshot: true
23+
24+
output:
25+
dir: ./build/site

docs/antora.yml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
name: skainet-transformers
2+
title: SKaiNET Transformers
3+
version: ~
4+
nav:
5+
- modules/ROOT/nav.adoc

docs/modules/ROOT/nav.adoc

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
* xref:index.adoc[Overview]
2+
3+
.Tutorials
4+
* xref:tutorials/getting-started.adoc[Getting Started]
5+
* xref:tutorials/tool-calling.adoc[Tool Calling with Any Model]
6+
* xref:tutorials/smoke-tests.adoc[Running Smoke Tests]
7+
8+
.How-to Guides
9+
* xref:how-to/add-model.adoc[Add a New Model Architecture]
10+
* xref:how-to/add-compute-backend.adoc[Add a Compute Backend]
11+
* xref:how-to/add-tool.adoc[Add a Custom Tool]
12+
* xref:how-to/run-unified-cli.adoc[Use the Unified CLI]
13+
14+
.Reference
15+
* xref:reference/architecture.adoc[Architecture Overview]
16+
* xref:reference/pipeline.adoc[Inference Pipeline]
17+
* xref:reference/tokenizer-api.adoc[Tokenizer API]
18+
* xref:reference/chat-session-api.adoc[ChatSession API]
19+
* xref:reference/model-registry.adoc[Model Registry]
20+
* xref:reference/cli-reference.adoc[CLI Reference]
21+
22+
.Explanation
23+
* xref:explanation/pipeline-design.adoc[Pipeline Design Decisions]
24+
* xref:explanation/dsl-vs-handcoded.adoc[DSL Networks vs Hand-Coded Runtimes]
25+
* xref:explanation/tokenizer-internals.adoc[Tokenizer Internals]

0 commit comments

Comments
 (0)