All notable changes to GPULlama3.java will be documented in this file.
- Add JDK 25 support with TornadoVM JDK25 and dual-JDK build profiles (#97)
- [models] Support for IBM Granite Models 3.2, 3.3 & 4.0 with FP16 and Q8 (#92)
- [docs] Update docs to use SDKMAN! and point to TornadoVM 2.2.0 (#93)
- Add JBang catalog and local usage examples to README.md (#91)
- Add
jbangscript and configuration to make easy to run (#90)
- Add compatibility method for langchain4j and quarkus in ModelLoader (#87)
- [refactor] Generalize the design of
tornadovmpackage to support multiple new models and types for GPU exec (#62) - Refactor/cleanup model loaders (#58)
- Add Support for Q8_0 Models (#59)
- [fix] Normalization compute step for non-nvidia hardware (#84)
- Update README to enhance TornadoVM performance section and clarify GP… (#85)
- Simplify installation by replacing TornadoVM submodule with pre-built SDK (#82)
- [FP16] Improved performance by fusing dequantize with compute in kernels: 20-30% Inference Speedup (#78)
- [cicd] Prevent workflows from running on forks (#83)
- [CI][packaging] Automate process of deploying a new release with Github actions (#81)
- [Opt] Manipulation of Q8_0 tensors with Tornado
ByteArrays (#79) - Optimization in Q8_0 loading (#74)
- [opt] GGUF Load Optimization for tensors in TornadoVM layout (#71)
- Add
SchedulerTypesupport to all TornadoVM layer planners and layer… (#66) - Weight Abstractions (#65)
- Bug fixes in sizes and names of GridScheduler (#64)
- Add Maven wrapper support (#56)
- Add changes used in Devoxx Demo (#54)