File tree Expand file tree Collapse file tree
Expand file tree Collapse file tree Original file line number Diff line number Diff line change 1+ name : Docs
2+
3+ on :
4+ push :
5+ branches : [ main, develop ]
6+ paths :
7+ - ' docs/**'
8+ - ' .github/workflows/docs.yml'
9+ pull_request :
10+ paths :
11+ - ' docs/**'
12+ - ' .github/workflows/docs.yml'
13+ workflow_dispatch :
14+
15+ concurrency :
16+ group : docs-${{ github.ref }}
17+ cancel-in-progress : true
18+
19+ permissions :
20+ contents : read
21+ pages : write
22+ id-token : write
23+
24+ jobs :
25+ build-docs :
26+ runs-on : ubuntu-latest
27+ timeout-minutes : 10
28+
29+ steps :
30+ - name : Checkout
31+ uses : actions/checkout@v6
32+
33+ - name : Build Antora site
34+ run : |
35+ docker run --rm \
36+ -v "${{ github.workspace }}:/antora" \
37+ --workdir /antora/docs \
38+ --entrypoint sh \
39+ docker.io/antora/antora:3.1 \
40+ -c "npm i asciidoctor-kroki && antora --stacktrace antora-playbook.yml"
41+
42+ - name : Upload artifact
43+ uses : actions/upload-pages-artifact@v3
44+ with :
45+ path : docs/build/site
46+
47+ deploy-docs :
48+ if : github.ref == 'refs/heads/develop' && github.event_name == 'push'
49+ needs : build-docs
50+ runs-on : ubuntu-latest
51+ environment :
52+ name : github-pages
53+ url : ${{ steps.deployment.outputs.page_url }}
54+
55+ steps :
56+ - name : Deploy to GitHub Pages
57+ id : deployment
58+ uses : actions/deploy-pages@v4
Original file line number Diff line number Diff line change 1+ site :
2+ title : SKaiNET Transformers
3+ start_page : skainet-transformers::index.adoc
4+
5+ content :
6+ sources :
7+ - url : .
8+ start_path : docs
9+ branches : HEAD
10+
11+ asciidoc :
12+ extensions :
13+ - asciidoctor-kroki
14+
15+ kroki :
16+ server-url : https://kroki.io
17+ fetch-diagram : true
18+
19+ ui :
20+ bundle :
21+ url : https://gitlab.com/antora/antora-ui-default/-/jobs/artifacts/HEAD/raw/build/ui-bundle.zip?job=bundle-stable
22+ snapshot : true
23+
24+ output :
25+ dir : ./build/site
Original file line number Diff line number Diff line change @@ -14,7 +14,29 @@ This led to:
1414
1515The pipeline is split into stages that are independently replaceable:
1616
17- [horizontal]
17+ [mermaid]
18+ ....
19+ graph LR
20+ subgraph "Model Format"
21+ A[Weight Loading]
22+ end
23+ subgraph "Architecture"
24+ B[Network Definition]
25+ end
26+ subgraph "Framework"
27+ C[Graph Compilation]
28+ D[Inference Runtime]
29+ end
30+ subgraph "I/O"
31+ E[Tokenization]
32+ end
33+ subgraph "Application"
34+ F[Chat Pipeline]
35+ end
36+
37+ A --> B --> C --> D --> E --> F
38+ ....
39+
1840Weight Loading:: Parse GGUF/SafeTensors into typed tensor maps. Model-format concern, not architecture concern.
1941Network Definition:: Pure functions (`llamaNetwork()`, `apertusNetwork()`) that return a `Module` tree. Architecture concern only.
2042Graph Compilation:: Trace the module tree into a DAG, apply optimization passes. Framework concern.
Original file line number Diff line number Diff line change @@ -28,23 +28,62 @@ llm-performance/ Benchmarking module
2828
2929== Dependency Graph
3030
31- ----
32- llm-apps/skainet-cli
33- -> llm-runtime/kllama -> llm-inference/llama -> llm-core
34- -> llm-agent -> llm-core
35- -> skainet-backend-cpu (SIMD tensor ops)
36- -> skainet-io-gguf (GGUF parsing)
31+ [mermaid]
32+ ....
33+ graph LR
34+ subgraph Apps
35+ skainet-cli
36+ kllama-cli
37+ end
3738
38- llm-agent
39- -> llm-core (InferenceRuntime, Tokenizer)
39+ subgraph Runtime
40+ kllama
41+ kqwen
42+ kgemma
43+ kapertus
44+ end
4045
41- llm-core
42- -> skainet-lang-core (tensor types, DSL)
43- -> skainet-compile-dag (compute graph)
44- -> skainet-compile-opt (optimization passes)
45- -> skainet-io-core (I/O abstractions)
46- -> skainet-io-gguf (GGUF reader)
47- ----
46+ subgraph Inference
47+ llama
48+ gemma
49+ apertus
50+ bert
51+ voxtral
52+ end
53+
54+ subgraph Core
55+ llm-core
56+ llm-agent
57+ end
58+
59+ subgraph SKaiNET
60+ skainet-lang-core
61+ skainet-compile-dag
62+ skainet-compile-opt
63+ skainet-io-gguf
64+ skainet-backend-cpu
65+ end
66+
67+ skainet-cli --> kllama
68+ skainet-cli --> llm-agent
69+ kllama-cli --> kllama
70+ kllama --> llama
71+ kllama --> llm-agent
72+ kqwen --> llama
73+ kgemma --> gemma
74+ kapertus --> apertus
75+ llama --> llm-core
76+ gemma --> llm-core
77+ apertus --> llm-core
78+ bert --> llm-core
79+ voxtral --> llm-core
80+ llm-agent --> llm-core
81+ llm-core --> skainet-lang-core
82+ llm-core --> skainet-compile-dag
83+ llm-core --> skainet-compile-opt
84+ llm-core --> skainet-io-gguf
85+ kllama --> skainet-backend-cpu
86+ ....
4887
4988== Key Interfaces
5089
Original file line number Diff line number Diff line change 33
44== Pipeline Stages
55
6- [source]
7- ----
8- GGUF/SafeTensors File
9- |
10- v
11- [1] WeightLoader Parse metadata + tensor data
12- |
13- v
14- [2] DSL Network Def llamaNetwork(), qwenNetwork(), apertusNetwork()
15- |
16- v
17- [3] ComputeGraph (DAG) Record forward pass into directed acyclic graph
18- |
19- v
20- [4] Optimization TransposeElim -> WeightDedup -> LLMFusion -> DCE
21- |
22- v
23- [5] Executor ComputeGraphExecutor with fused kernels
24- |
25- v
26- [6] InferenceRuntime forward(tokenId) -> logits, generate(), sample()
27- |
28- v
29- [7] Tokenizer encode(text) -> IntArray, decode(token) -> String
30- |
31- v
32- [8] ChatPipeline ChatTemplate + AgentLoop + ToolRegistry
33- |
34- v
35- Generated text / Tool call results
36- ----
6+ [mermaid]
7+ ....
8+ graph TD
9+ A[GGUF / SafeTensors File] --> B["[1] WeightLoader<br/>Parse metadata + tensor data"]
10+ B --> C["[2] DSL Network Def<br/>llamaNetwork(), qwenNetwork()"]
11+ C --> D["[3] ComputeGraph (DAG)<br/>Record forward pass"]
12+ D --> E["[4] Optimization<br/>TransposeElim → WeightDedup → LLMFusion → DCE"]
13+ E --> F["[5] Executor<br/>ComputeGraphExecutor with fused kernels"]
14+ F --> G["[6] InferenceRuntime<br/>forward(tokenId) → logits"]
15+ G --> H["[7] Tokenizer<br/>encode(text) → IntArray"]
16+ H --> I["[8] ChatPipeline<br/>ChatTemplate + AgentLoop + ToolRegistry"]
17+ I --> J[Generated text / Tool call results]
18+
19+ style A fill:#f9f,stroke:#333
20+ style J fill:#9f9,stroke:#333
21+ ....
3722
3823== Stage Details
3924
Original file line number Diff line number Diff line change @@ -7,17 +7,19 @@ This tutorial shows how to use `ChatSession` to add tool calling to any model ru
77
88The tool calling pipeline is decoupled from the model runtime:
99
10- ----
11- InferenceRuntime + Tokenizer + ModelMetadata
12- |
13- ChatSession
14- |
15- AgentLoop (generate -> parse -> execute -> re-prompt)
16- |
17- ChatTemplate (format messages, parse tool calls)
18- |
19- ToolRegistry (execute tool functions)
20- ----
10+ [mermaid]
11+ ....
12+ graph TD
13+ A[InferenceRuntime + Tokenizer + ModelMetadata] --> B[ChatSession]
14+ B --> C[AgentLoop]
15+ C --> D{Tool calls?}
16+ D -->|Yes| E[ToolRegistry<br/>execute tool functions]
17+ E --> F[Append result to messages]
18+ F --> C
19+ D -->|No| G[Final response]
20+ B --> H[ChatTemplate<br/>format messages, parse tool calls]
21+ H --> C
22+ ....
2123
2224Any model that implements `InferenceRuntime` and has a `Tokenizer` can use tool calling.
2325
You can’t perform that action at this time.
0 commit comments