Skip to content

TechNavii/LLM-Memory-Calculator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

1 Commit
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿง  LLM Memory Calculator

A comprehensive web application for calculating Large Language Model (LLM) memory requirements and performance metrics for various GPU configurations and unified memory systems (Apple Silicon).

โœจ Features

๐Ÿš€ Latest Features (v2.0)

  • ๐Ÿ”„ Auto-Updating Model Database: Fetches latest Ollama models including Gemma3, DeepSeek-R1, Qwen3, Llama4, Phi4
  • ๐ŸŽ Enhanced Apple Silicon Support: M4 series support, adjustable unified memory (8GB-512GB)
  • ๐ŸŽ›๏ธ Advanced Quantization: 1-bit to 32-bit precision options (INT1, INT2, INT3, INT4, INT5, INT6, INT8, FP16, BF16, FP32)
  • ๐Ÿงฎ Modular Calculations: Granular control over memory and performance calculations
  • ๐Ÿ’พ Comprehensive Memory Analysis: KV cache, activation memory, framework overhead, system overhead, peak memory

Core Features

  • Memory Footprint Calculation: VMware-based sizing formulas with industry-standard accuracy
  • Performance Metrics: Latency, throughput, time-to-first-token, prefill time estimation
  • GPU Database: 80+ GPUs including NVIDIA H100/H200, AMD MI300X, Apple M4 series
  • LLM Model Support: 200+ models from Ollama + proprietary APIs (Claude, GPT, Gemini)
  • Real-time Analysis: OOM detection, optimization recommendations, warnings
  • Multi-GPU Support: Tensor parallelism calculations for enterprise deployments

๐Ÿš€ Quick Start

Prerequisites

  • Node.js 18+
  • npm/yarn/pnpm

Installation

# Clone the repository
git clone <repository-url>
cd llm-memory-calculator

# Install dependencies
npm install

# Start development server
npm run dev

# Open http://localhost:3000

๐ŸŽฏ Usage Guide

Basic Workflow

  1. ๐Ÿ“ฑ Select Model: Choose from auto-updated Ollama models or proprietary APIs
  2. ๐Ÿ–ฅ๏ธ Choose Hardware: Select from comprehensive GPU/processor database
  3. โš™๏ธ Configure Parameters: Set context size, concurrent requests, quantization
  4. ๐ŸŽ›๏ธ Customize Calculations: Toggle memory components and performance metrics
  5. ๐Ÿ“Š Analyze Results: Review memory usage, performance, warnings, and recommendations

Advanced Features

Memory Calculation Options

  • โœ… KV Cache Memory: Attention cache for inference contexts
  • โœ… Activation Memory: Intermediate computation memory
  • โœ… Framework Overhead: PyTorch/CUDA overhead (15% default)
  • โœ… System Overhead: OS/driver reserved memory (10% default)
  • โœ… Peak Memory Factor: Model loading peak usage (1.5x multiplier)

Performance Calculation Options

  • โœ… Prefill Time: Input processing latency
  • โœ… Generation Time (TPOT): Time per output token
  • โœ… Throughput: Tokens per second output rate
  • โœ… End-to-End Latency: Complete request processing time

Analysis Options

  • โœ… Warnings: Performance bottlenecks, OOM conditions
  • โœ… Recommendations: Optimization suggestions, hardware advice

๐Ÿงฎ Calculation Methodology

Memory Footprint Formula

Total Memory = Model_Weights + KV_Cache + Activation_Memory + Framework_Overhead + System_Overhead

Model_Weights = parameters ร— quantization_bytes_per_param
KV_Cache = 2 ร— 2 ร— n_layers ร— d_model ร— context_window ร— concurrent_requests / (1024ยณ)
Activation_Memory = batch_size ร— seq_len ร— n_layers ร— d_model ร— bytes_per_activation / (1024ยณ)
Framework_Overhead = Model_Weights ร— 0.15  (configurable)
System_Overhead = Total_GPU_Memory ร— 0.10  (configurable)

Performance Metrics

Prefill_Time = (2 ร— Model_Parameters / num_GPUs) / GPU_TFLOPS
Time_per_Output_Token = (2 ร— Model_Parameters / num_GPUs) / Memory_Bandwidth ร— 1000
TTFT = Prefill_Time + TPOT
E2E_Latency = Prompt_Size ร— Prefill_Time + Response_Size ร— TPOT
Throughput = Response_Size / E2E_Latency

๐Ÿ–ฅ๏ธ Supported Hardware

NVIDIA GPUs (Consumer & Data Center)

Series Models Memory Range
RTX 40 4090, 4080 SUPER, 4070 Ti, 4060 Ti 8GB - 24GB
RTX 30 3090 Ti, 3090, 3080 Ti, 3070, 3060 8GB - 24GB
H-Series H100 SXM/PCIe/NVL, H200 SXM/NVL 80GB - 188GB
A-Series A100 80GB/40GB, A30, A10 24GB - 80GB
L-Series L40S, L40 48GB

AMD GPUs (Consumer & Data Center)

Series Models Memory Range
RX 7000 7900 XTX, 7900 XT, 7800 XT, 7700 XT 12GB - 24GB
RX 6000 6900 XT, 6800 XT, 6700 XT 12GB - 16GB
MI Series MI300X, MI250X 128GB - 192GB

Intel GPUs

Series Models Memory Range
Arc A A770, A750 8GB - 16GB

Apple Silicon (Unified Memory)

Generation Models Memory Range Bandwidth
M4 (2024) M4, M4 Pro, M4 Max 16GB - 128GB 120GB/s - 546GB/s
M3 (2023) M3, M3 Pro, M3 Max, M3 Ultra 24GB - 512GB 100GB/s - 800GB/s
M2 (2022) M2, M2 Pro, M2 Max, M2 Ultra 24GB - 192GB 100GB/s - 800GB/s
M1 (2020) M1, M1 Pro, M1 Max, M1 Ultra 16GB - 128GB 68GB/s - 800GB/s

Note: Apple Silicon supports adjustable unified memory configurations

๐Ÿค– Supported Models

๐Ÿ  Local Models (Auto-Updated from Ollama)

  • Meta Llama: 3.1 (8B, 70B, 405B), 3.2 (1B, 3B, 11B, 90B), 3.3 (70B), 4 (expected)
  • Mistral AI: 7B v0.3, Nemo 12B, Small 22B, Large 123B
  • Mixtral: 8x7B, 8x22B (Mixture of Experts)
  • Alibaba Qwen: 2.5 (0.5B-72B), 2.5-Coder (7B-32B), Qwen3 series
  • Google Gemma: 2B, 7B, 9B, 27B, Gemma3 series
  • Microsoft Phi: 3-Mini (3.8B), 3-Medium (14B), Phi4 (14B)
  • DeepSeek: Coder (6.7B, 33B), DeepSeek-R1 (1.5B-67B), DeepSeek-V3
  • Code Models: CodeLlama, StarCoder2, CodeGemma, Granite-Code
  • Lightweight: TinyLlama (1.1B), SmolLM2 (135M-1.7B), MiniCPM
  • Specialized: Nomic-Embed, BGE, Moondream (vision), LLaVA

โ˜๏ธ Proprietary Models (API Only)

  • Anthropic Claude: 3 Haiku, 3 Sonnet, 3 Opus
  • OpenAI GPT: 3.5-Turbo, 4, 4-Turbo
  • Google Gemini: 1.5 Flash, 1.5 Pro

๐Ÿ”ง Quantization Support

Format Bits Memory Usage Quality Use Case
FP32 32 100% Highest Research, training
FP16 16 50% High Production inference
BF16 16 50% High Stable training
INT8 8 25% Good Efficient inference
INT6 6 18.75% Moderate Memory-constrained
INT5 5 15.625% Moderate Extreme efficiency
INT4 4 12.5% Acceptable Maximum practical
INT3 3 9.375% Poor Research
INT2 2 6.25% Very Poor Experimental
INT1 1 3.125% Unusable Binary networks

๐Ÿ—๏ธ Architecture

src/
โ”œโ”€โ”€ components/                 # React components
โ”‚   โ”œโ”€โ”€ Calculator.tsx         # Main calculator interface
โ”‚   โ”œโ”€โ”€ DebugModels.tsx       # Model debugging tools
โ”‚   โ”œโ”€โ”€ DatabaseStatus.tsx    # Database status display
โ”‚   โ””โ”€โ”€ FeatureHighlights.tsx # Feature showcase
โ”œโ”€โ”€ data/                      # Static configuration
โ”‚   โ”œโ”€โ”€ gpuSpecs.ts           # GPU specifications
โ”‚   โ”œโ”€โ”€ quantizationConfigs.ts # Quantization options
โ”‚   โ””โ”€โ”€ ollamaModels.ts       # Fallback model data
โ”œโ”€โ”€ hooks/                     # React hooks
โ”‚   โ””โ”€โ”€ useDataUpdater.ts     # Auto-updating data hook
โ”œโ”€โ”€ types/                     # TypeScript definitions
โ”‚   โ””โ”€โ”€ index.ts              # Shared type definitions
โ”œโ”€โ”€ utils/                     # Core logic
โ”‚   โ”œโ”€โ”€ calculator.ts         # Calculation engine
โ”‚   โ”œโ”€โ”€ dataUpdater.ts        # Dynamic model fetching
โ”‚   โ””โ”€โ”€ __tests__/            # Test suites
โ””โ”€โ”€ App.tsx                   # Main application

๐Ÿ› ๏ธ Development

Scripts

npm run dev          # Development server with HMR
npm run build        # Production build
npm run preview      # Preview production build
npm run test         # Run test suite
npm run test:watch   # Tests in watch mode
npm run test:coverage # Coverage report
npm run lint         # Code linting
npm run lint:fix     # Auto-fix linting issues

Testing

Comprehensive test coverage for:

  • โœ… Memory calculation accuracy
  • โœ… Performance metric calculations
  • โœ… Quantization conversions
  • โœ… Multi-GPU configurations
  • โœ… OOM detection logic
  • โœ… Edge cases and error handling
  • โœ… Data fetching and parsing
# Run tests
npm run test

# Watch mode during development
npm run test:watch

# Generate coverage report
npm run test:coverage

Technical Stack

  • Framework: React 18 + TypeScript
  • UI Library: Material-UI (MUI) v5
  • Build Tool: Vite (fast HMR, modern bundling)
  • Testing: Jest + React Testing Library
  • Code Quality: ESLint + TypeScript ESLint
  • Charts: Recharts for data visualization

๐Ÿ”„ Auto-Update System

The application automatically fetches the latest model information from Ollama's model registry:

Features

  • ๐Ÿ”„ 24-Hour Auto-Updates: Checks for new models daily
  • ๐Ÿš€ Force Update: Manual refresh for immediate updates
  • ๐Ÿ“ฆ CORS Proxy: Bypasses browser restrictions via Vite proxy
  • ๐Ÿ›ก๏ธ Fallback System: Uses cached data if updates fail
  • ๐Ÿ› Debug Tools: Built-in model fetching diagnostics

Model Detection

Automatically detects and adds new models including:

  • Latest Llama, Gemma, Qwen releases
  • Emerging models from Mistral, DeepSeek
  • Specialized models (code, vision, embedding)
  • Community-contributed models

๐Ÿš€ Deployment

Production Build

# Create optimized build
npm run build

# Preview production build
npm run preview

# Deploy dist/ folder to your hosting platform

Environment Configuration

Create .env.local for environment-specific settings:

VITE_API_BASE_URL=https://your-api-domain.com
VITE_ENABLE_DEBUG=false

Hosting Recommendations

  • Vercel: Zero-config deployment with automatic HTTPS
  • Netlify: Easy deployment with form handling
  • GitHub Pages: Free hosting for open-source projects
  • Docker: Containerized deployment for enterprise

๐Ÿค Contributing

Development Setup

# Fork and clone the repository
git clone https://github.com/your-username/llm-memory-calculator.git
cd llm-memory-calculator

# Install dependencies
npm install

# Start development server
npm run dev

Contribution Guidelines

  1. ๐Ÿ”€ Fork the repository
  2. ๐ŸŒฟ Create a feature branch: git checkout -b feature/amazing-feature
  3. โœจ Develop your changes with tests
  4. โœ… Test your changes: npm run test
  5. ๐Ÿ“ Commit with clear messages: git commit -m 'Add amazing feature'
  6. ๐Ÿš€ Push to your branch: git push origin feature/amazing-feature
  7. ๐Ÿ“‹ Submit a pull request

Areas for Contribution

  • ๐Ÿ†• New GPU/processor support
  • ๐Ÿค– Additional LLM model support
  • ๐Ÿ“Š Enhanced visualization features
  • ๐Ÿงฎ Advanced calculation options
  • ๐ŸŒ Internationalization
  • ๐Ÿ“ฑ Mobile responsiveness
  • โšก Performance optimizations

๐Ÿ“„ License

MIT License - see LICENSE file for details.

๐Ÿ™ Acknowledgments

  • VMware: Sizing methodology and performance formulas
  • qoofyk: Original Python calculator inspiration
  • Ollama Team: Model registry and local inference platform
  • TechPowerUp: Comprehensive GPU specification database
  • React Community: Exceptional development ecosystem

๐Ÿ“š References

๐Ÿ“ง Support

For questions, issues, or feature requests:


โญ Star this repository if it helped you! โญ

Made with โค๏ธ for the LLM community

About

A comprehensive web application for calculating Large Language Model (LLM) memory requirements and performance metrics for various GPU configurations and unified memory systems (Apple Silicon).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages