This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
# Configure with Visual Studio generator (recommended for Windows)
cmake -B build -G "Visual Studio 17 2022" -A x64
# Build main executable
cmake --build build --config Release --target onnxtest
# Build test suite
cmake --build build --config Release --target test_suite
# Build everything
cmake --build build --config Release# Run all tests via CTest
cd build && ctest -C Release
# Run test executable directly (better output)
./build/Release/test_suite.exe
# Run tests using custom target (convenience)
cmake --build build --target run-tests
# Run specific test
./build/Release/test_suite.exe --gtest_filter=TokenizerTests.*
# Run filtered tests via CMake
cmake --build build --target run-tests-filtered -DGTEST_FILTER="TokenizerTests.*"
# List available tests
./build/Release/test_suite.exe --gtest_list_tests# Configure with coverage enabled
cmake -B build-coverage -G "Visual Studio 17 2022" -A x64 -DCODE_COVERAGE=ON
# Build and generate coverage
cmake --build build-coverage --config Debug --target test_suite
cmake --build build-coverage --target coverage
# Open HTML report
cmake --build build-coverage --target coverage-html# Check code style issues
cmake --build build --target clang-tidy
# Auto-fix style issues
cmake --build build --target clang-tidy-fixsrc/
├── inference/ # ONNX inference module (static library)
│ ├── include/ # Public API headers
│ │ ├── inference_api.hpp # Main inference API
│ │ └── tokenizer_api.hpp # Tokenizer public interface
│ └── src/ # Implementation
│ ├── tokenizer.cpp/hpp # BPE tokenizer for Phi-3
│ ├── inference_helpers.cpp # CUDA setup, token selection, generation loop
│ ├── inference_session.cpp # Session management and text generation
│ ├── conversation.cpp # Multi-turn conversation handling
│ ├── code_extractor.cpp # Extract code blocks from AI responses
│ ├── config.hpp # Model constants (max tokens, temperature, etc.)
│ └── console_output.hpp # Windows console Unicode handling
├── lua/ # Lua interpreter module (static library)
│ ├── include/ # Public API headers
│ │ └── lua_api.hpp
│ └── src/ # Implementation
│ ├── lua_runtime.cpp # Core Lua VM management
│ ├── lua_executor.cpp # High-level execution interface
│ ├── lua_extractor.cpp # Extract Lua code from text
│ └── win32_bindings.cpp # Windows API bindings for Lua
├── app/ # Main application
│ └── main.cpp # Entry point, orchestrates inference + Lua execution
└── tests/ # All unit tests
├── inference/ # Inference module tests
├── lua/ # Lua module tests
└── test_main.cpp # Test runner entry point
Inference Module Design:
The inference module (src/inference/) is built as a static library that encapsulates:
- Tokenization logic with BPE (Byte Pair Encoding) for Phi-3 models
- ONNX Runtime session management with automatic CUDA/CPU fallback
- Text generation loop with KV-cache management for efficient inference
- Conversation management for multi-turn interactions
- Code extraction utilities to parse AI-generated code blocks
- Windows-specific console handling for Unicode I/O
Lua Module Design:
The Lua module (src/lua/) provides:
- Sandboxed Lua 5.4 runtime with Sol2 C++ bindings
- Win32 API bindings accessible via
win32namespace in Lua - Automatic extraction and execution of Lua code from AI responses
- Resource management and error handling
Model Integration Flow:
main.cppinitializes ONNX Runtime environment and sessionsetup_cuda_provider()attempts GPU setup, falls back to CPUTokenizerloads vocabulary from JSON and handles prompt formattingInferenceSessionmanages the conversation and generation loop:- Encodes prompt → runs model → samples tokens → updates KV-cache → decodes output
- Manages past_key_values tensors across iterations for context retention
LuaExtractorparses AI response for code blocksLuaExecutorruns extracted Lua code in sandboxed environment
Critical Paths in Code:
- Model path hardcoded in
src/app/main.cpp:18 - Tokenizer path hardcoded in
src/app/main.cpp:19 - Update these before running!
Place in lib/onnxruntime/:
- All
.libfiles (for linking) onnxruntime_providers_cuda.dll(GPU acceleration)onnxruntime_providers_shared.dll(required dependency)onnxruntime-genai.dll(generation features)
The main onnxruntime.dll uses system installation from C:\Windows\System32 by default.
Add to PATH:
$env:PATH = $env:PATH + ";C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin;C:\Program Files\NVIDIA\CUDNN\v9.12\bin\12.9"-
Create test file in appropriate subdirectory:
src/tests/inference/for inference testssrc/tests/lua/for Lua testssrc/tests/for core/utility tests
-
Add to
src/tests/CMakeLists.txtin theTEST_SOURCESlist:set(TEST_SOURCES test_main.cpp # ... existing files ... your_new_test.cpp # or inference/your_new_test.cpp )
-
Use GoogleTest macros:
TEST(),TEST_F(),EXPECT_EQ(), etc.
CODE_COVERAGE- Enable code coverage analysis (OFF by default)ENABLE_CLANG_TIDY- Enable clang-tidy static analysis (ON by default)USE_SYSTEM_ONNXRUNTIME- Use system ONNX Runtime DLL from System32 (ON by default)
CMake is typically installed at C:\Program Files\CMake\bin\cmake.exe and may not be in PATH by default. Use the full path:
"C:\Program Files\CMake\bin\cmake.exe" -B build -G "Visual Studio 17 2022" -A x64When running tests from bash-like environments on Windows, use PowerShell or cmd wrappers:
# Using PowerShell (recommended)
powershell -Command "& './build/Release/test_suite.exe' --gtest_brief=1"
# Using cmd
cmd /c build\Release\test_suite.exe --gtest_brief=1
# Filter out problematic tests if needed
powershell -Command "& './build/Release/test_suite.exe' --gtest_filter=-TokenizerAdvancedTest.* --gtest_brief=1"When upgrading from C++17 to C++20, be aware of:
u8string literals now createchar8_t*instead ofchar*. Fix with:// C++20 fix for u8 literals const char8_t* u8_literal = u8"UTF-8 text"; std::string str(reinterpret_cast<const char*>(u8_literal));
- Update both
CMAKE_CXX_STANDARDand any hardcoded-std=c++17flags in CMakeLists.txt
- C++20: Requires Visual Studio 2022 (v17) or later
std::filesystemfor path handling (replaces Windows-specific APIs)std::string_viewfor non-owning string parameters[[nodiscard]]attributes for important return valuesconstexprfor compile-time constants (already used extensively)std::spanfor array views (C++20)- Concepts for template constraints (C++20)
Some tests may fail or need to be skipped:
TokenizerAdvancedTest.MixedSpecialAndRegularTokens- SEH exception on some systemsInferenceTest.*- May fail if model files are not present- Console thread safety tests may cause buffer corruption on Windows