Skip to content

Commit e766e8c

Browse files
refactor : configurations, and introduce CUDA features (#53) (#53)
## Summary ## Causal Multi-Head Attention Forward Pass (CUDA) PR implements the CUDA forward pass for causal multi-head attention (attention_forward). It includes the core GPU kernel, custom block-level reduction primitives, and tensor validation helpers. ## Core Attention Kernelattention_forward_kernel: - Computes scaled dot-product attention on an interleaved QKV input tensor structured as [Batch, Time, 3 * Channels]. - Causal Masking: Enforces autoregressive constraints by preventing tokens from attending to future time steps ($t2 > t$). - Implements parallelized block_max and block_sum device functions. - Leverages cooperative warp shuffles (warp_max, warp_sum) and shared memory to handle stable online softmax normalization #52 #11 #12 #14 #29
2 parents 8451d4a + c7a1e01 commit e766e8c

50 files changed

Lines changed: 2614 additions & 2778 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.dockerignore

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
.git
2+
.gitignore
3+
.github
4+
.venv
5+
**/__pycache__
6+
**/*.pyc
7+
**/*.pyo
8+
**/*.pyd
9+
engine/logs/
10+
node_modules
11+
frontend/node_modules
12+
.npm-cache
13+
frontend/.vite
14+
frontend/dist
15+
16+
# Model weights
17+
*.pt
18+
*.bin
19+
models/
20+
21+
# Windows build artifacts
22+
*.exe
23+
quadtrix.exe
24+
*.png
25+
*.jpg
26+
*.jpeg
27+
*.md
28+
LICENSE
29+
contributing.md
30+
SECURITY.md
31+
run.md
32+
.DS_Store
33+
Thumbs.db
34+
.idea
35+
.vscode

.github/workflows/ci.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ jobs:
4444
pip install fastapi "uvicorn[standard]" pydantic pydantic-settings httpx redis
4545
4646
- name: Compile Python sources
47-
run: python -m compileall backend engine iGPU
47+
run: python -m compileall backend engine
4848

4949
- name: Import FastAPI application
5050
working-directory: backend
Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
name: Publish Docker image
2+
on:
3+
push:
4+
branches:
5+
- master
6+
tags:
7+
- "v*.*.*"
8+
paths-ignore:
9+
- 'cuda/**'
10+
- 'docs/**'
11+
- '**.md'
12+
pull_request:
13+
branches:
14+
- master
15+
paths-ignore:
16+
- 'cuda/**'
17+
- 'docs/**'
18+
- '**.md'
19+
20+
env:
21+
REGISTRY: ghcr.io
22+
23+
jobs:
24+
build-and-push:
25+
name: Build & push to ghcr.io
26+
runs-on: ubuntu-latest
27+
28+
permissions:
29+
contents: read
30+
packages: write
31+
32+
steps:
33+
- name: Checkout repository
34+
uses: actions/checkout@v4
35+
- name: Set lowercase image name
36+
id: image
37+
run: |
38+
echo "name=$(echo '${{ github.repository }}' | tr '[:upper:]' '[:lower:]')" >> $GITHUB_OUTPUT
39+
40+
- name: Set up QEMU
41+
uses: docker/setup-qemu-action@v3
42+
- name: Set up Docker Buildx
43+
uses: docker/setup-buildx-action@v3
44+
- name: Log in to ghcr.io
45+
if: github.event_name != 'pull_request'
46+
uses: docker/login-action@v3
47+
with:
48+
registry: ${{ env.REGISTRY }}
49+
username: ${{ github.actor }}
50+
password: ${{ secrets.GITHUB_TOKEN }}
51+
- name: Extract Docker metadata
52+
id: meta
53+
uses: docker/metadata-action@v5
54+
with:
55+
images: ${{ env.REGISTRY }}/${{ steps.image.outputs.name }}
56+
tags: |
57+
type=raw,value=latest,enable={{is_default_branch}}
58+
type=semver,pattern={{version}}
59+
type=semver,pattern={{major}}.{{minor}}
60+
type=ref,event=pr
61+
- name: Build and push Docker image (CPU)
62+
uses: docker/build-push-action@v6
63+
with:
64+
context: .
65+
file: ./Dockerfile
66+
push: ${{ github.event_name != 'pull_request' }}
67+
tags: ${{ steps.meta.outputs.tags }}
68+
labels: ${{ steps.meta.outputs.labels }}
69+
build-args: |
70+
BASE_IMAGE=ubuntu:24.04
71+
cache-from: type=gha
72+
cache-to: type=gha,mode=max
73+
- name: Image published
74+
if: github.event_name != 'pull_request'
75+
run: |
76+
echo "Image published to GitHub Packages"
77+
echo ""
78+
echo "Pull with:"
79+
echo " docker pull ${{ env.REGISTRY }}/${{ steps.image.outputs.name }}:latest"
80+
echo ""
81+
echo "Or via docker-compose:"
82+
echo " image: ${{ env.REGISTRY }}/${{ steps.image.outputs.name }}:latest"

.github/workflows/github-package.yml

Lines changed: 0 additions & 44 deletions
This file was deleted.

.github/workflows/release.yml

Lines changed: 0 additions & 57 deletions
This file was deleted.

.npmignore

Lines changed: 0 additions & 25 deletions
This file was deleted.

Dockerfile

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
FROM ubuntu:24.04 AS builder
2+
3+
ENV DEBIAN_FRONTEND=noninteractive
4+
RUN apt-get update && apt-get install -y --no-install-recommends \
5+
g++ \
6+
python3 \
7+
python3-pip \
8+
python3-venv \
9+
curl \
10+
ca-certificates \
11+
&& curl -fsSL https://deb.nodesource.com/setup_20.x | bash - \
12+
&& apt-get install -y --no-install-recommends nodejs \
13+
&& rm -rf /var/lib/apt/lists/*
14+
15+
WORKDIR /build
16+
COPY . .
17+
RUN g++ -std=c++17 -O2 -I. -Iinclude -o quadtrix main.cpp
18+
RUN cd frontend \
19+
&& npm ci \
20+
&& npm run build
21+
RUN python3 -m venv /venv \
22+
&& /venv/bin/pip install --upgrade pip --quiet \
23+
&& /venv/bin/pip install -r backend/requirements.txt --quiet
24+
25+
ARG BASE_IMAGE=ubuntu:24.04
26+
FROM ${BASE_IMAGE:-ubuntu:24.04} AS runtime
27+
28+
LABEL org.opencontainers.image.title="Quadtrix.cpp"
29+
LABEL org.opencontainers.image.description="Local LLM with C++/PyTorch backends and React UI"
30+
LABEL org.opencontainers.image.source="https://github.com/Eamon2009/Quadtrix.cpp"
31+
LABEL org.opencontainers.image.version="1.1.0"
32+
LABEL org.opencontainers.image.licenses="MIT"
33+
34+
ENV DEBIAN_FRONTEND=noninteractive \
35+
PYTHONUNBUFFERED=1 \
36+
PATH="/venv/bin:$PATH"
37+
38+
# Runtime system packages
39+
RUN apt-get update && apt-get install -y --no-install-recommends \
40+
python3 \
41+
supervisor \
42+
curl \
43+
ca-certificates \
44+
&& curl -fsSL https://deb.nodesource.com/setup_20.x | bash - \
45+
&& apt-get install -y --no-install-recommends nodejs \
46+
&& npm install -g serve --quiet \
47+
&& rm -rf /var/lib/apt/lists/*
48+
49+
WORKDIR /app
50+
COPY --from=builder /venv /venv
51+
COPY --from=builder /build/quadtrix /app/quadtrix
52+
COPY --from=builder /build/frontend/dist /app/frontend/dist
53+
COPY --from=builder /build/backend /app/backend
54+
COPY --from=builder /build/engine /app/engine
55+
COPY supervisord.conf /etc/supervisor/conf.d/quadtrix.conf
56+
COPY docker-entrypoint.sh /app/entrypoint.sh
57+
58+
RUN chmod +x /app/entrypoint.sh /app/quadtrix \
59+
&& mkdir -p /var/log/supervisor /app/models
60+
VOLUME ["/app/models"]
61+
ENV TORCH_CHECKPOINT_PATH=/app/models/best_model.pt \
62+
GPT_MODEL_PATH=/app/models/best_model.bin \
63+
API_PORT=3001 \
64+
CORS_ORIGINS=http://localhost:8080 \
65+
LOG_LEVEL=INFO \
66+
MAX_SESSIONS=1000 \
67+
SESSION_TTL_HOURS=24
68+
EXPOSE 3001 8080
69+
70+
ENTRYPOINT ["/app/entrypoint.sh"]

README.md

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,24 @@
11
# Quadtrix.cpp
2-
<img width="2442" height="1586" alt="run_20260508_110726" src="https://github.com/user-attachments/assets/ef51d1c3-e28e-4674-8a71-5513e753b174" />
3-
4-
Quadtrix.cpp is a local language model project with several execution paths:
5-
6-
- A dependency-free C++17 transformer implementation with manual forward and backward passes.
7-
- A PyTorch training and inference path for faster experimentation on CPU, CUDA, or supported accelerator backends.
8-
- A FastAPI middleware layer for chat sessions, health checks, backend selection, and feedback.
9-
- A React + TypeScript frontend for local chat, settings, session history, and model status.
10-
- Optional package/CLI support through `bin/quadtrix.js`.
2+
---
3+
Quadtrix.cpp is a local large language model project built around a modular, multi-path architecture that allows to choose the right execution strategy for their hardware and workflow. Whether you are working on a bare-metal embedded environment, running experiments on a GPU cluster, serving a REST API, or interacting through a browser-based chat interface, Quadtrix.cpp provides a coherent and composable foundation for each of those scenarios. This is designed to be approachable for people who want to read and modify every layer of the stack, while remaining practical enough for people who simply want to spin up a working local model quickly.
4+
> For full technical reference, check the documentation — <a href="https://eamon2009.github.io/LLMs/" style="color:#1a73e8;text-decoration:underline;" target="_blank"> Docs</a>
5+
116

127

138
> [!IMPORTANT]
149
> Please be aware that several commands listed in this documentation—specifically those involving file paths and directory navigation—should not be directly copied and pasted into your terminal. Because file structures and path syntax (such as / vs \) vary significantly across operating systems like Windows, macOS, and Linux, you must manually adjust these arguments to match your local environment. Ensure you verify your current working directory and replace any placeholder paths with the absolute or relative path specific to your machine to avoid execution errors.
1510
11+
---
12+
## Architecture
1613

17-
The project is designed as a technical learning implementation. The C++ path exposes the transformer internals directly: tensor operations, attention, layer normalization, cross-entropy, analytical gradients, AdamW, checkpointing, and autoregressive generation.
14+
<img width="1016" height="684" alt="image" src="https://github.com/user-attachments/assets/0e9faad4-71a9-4c7f-80e9-1136dfea6e57" />
15+
The diagram shows how tokens enter at the bottom as raw IDs, get converted into vector embeddings with positional information added, then pass upward through a repeated stack of decoder blocks - each block applying masked attention followed by a feed-forward layer, with normalisation wrapping both. At the very top, a linear projection maps those representations to output logits across the vocabulary. The right-hand side zooms into the attention mechanism itself, showing how queries, keys, and values are linearly projected, fed into a scaled dot-product with an optional causal mask and softmax, then concatenated across all heads before being projected back out. The training flow panel on the far right shows this running as a five-step cycle per batch: data loading, forward pass, loss computation, backward pass for gradients, and a weight update. The bottom section confirms the behaviour through training loss, validation loss, and perplexity plots - all three curves descending and converging steadily as steps increase, indicating the model is learning as expected.
16+
1817

1918
## v1.1.0
2019
<img width="2185" height="829" alt="run_20260430_192930" src="https://github.com/user-attachments/assets/c6db061a-aa8d-4d8d-a1e2-1a81418bb613" />
20+
<img width="2442" height="1586" alt="run_20260508_110726" src="https://github.com/user-attachments/assets/ef51d1c3-e28e-4674-8a71-5513e753b174" />
21+
2122

2223
---
2324

backend/.env.example

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,5 +5,5 @@ LOG_LEVEL=INFO
55
MAX_SESSIONS=1000
66
SESSION_TTL_HOURS=24
77
CPP_SERVER_URL=http://localhost:8080
8-
TORCH_CHECKPOINT_PATH=../engine/best_model .pt
8+
TORCH_CHECKPOINT_PATH=../engine/best_model.pt
99
REQUEST_TIMEOUT_SECONDS=60

backend/config.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ class Settings(BaseSettings):
1313
max_sessions: int = Field(default=1000, alias="MAX_SESSIONS")
1414
session_ttl_hours: int = Field(default=24, alias="SESSION_TTL_HOURS")
1515
cpp_server_url: str = Field(default="http://localhost:8080", alias="CPP_SERVER_URL")
16-
torch_checkpoint_path: str = Field(default="../engine/best_model .pt", alias="TORCH_CHECKPOINT_PATH")
16+
torch_checkpoint_path: str = Field(default="../engine/best_model.pt", alias="TORCH_CHECKPOINT_PATH")
1717
request_timeout_seconds: float = Field(default=60.0, alias="REQUEST_TIMEOUT_SECONDS")
1818

1919
model_config = SettingsConfigDict(env_file=".env", extra="ignore", populate_by_name=True)

0 commit comments

Comments
 (0)