[test]Evaluate model performance and accuracy with UCM#642
Merged
mag1c-h merged 3 commits intoModelEngine-Group:developfrom Jan 23, 2026
Merged
[test]Evaluate model performance and accuracy with UCM#642mag1c-h merged 3 commits intoModelEngine-Group:developfrom
mag1c-h merged 3 commits intoModelEngine-Group:developfrom
Conversation
1f9b201 to
75d09dd
Compare
75d09dd to
f9de3fd
Compare
f9de3fd to
7472627
Compare
Wwwzff
reviewed
Jan 19, 2026
103abf5 to
c8de951
Compare
Wwwzff
previously approved these changes
Jan 19, 2026
c8de951 to
37af8ee
Compare
37af8ee to
0969138
Compare
0969138 to
d4073f8
Compare
d4073f8 to
0c45a1a
Compare
mag1c-h
approved these changes
Jan 23, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
This PR introduces a comprehensive model validation test suite to evaluate both performance (latency metrics) and accuracy (F1-score) under the following three key UCM caching scenarios:
Naive: No cache hits (hit rate = 0%) — serves as the baseline with full recomputation.
Sparse: Evaluated at hit rate = 0% to assess the performance and accuracy gains enabled by sparsity-aware mechanisms.
Prefix Caching (PC): Evaluated across multiple hit rates [0%, 30%, 50%, 80%, 100%] to demonstrate the impact of prefix reuse on inference latency.
Modifications
Added test cases to verify whether the model is compatible with PC and sparsification.
Test