feat: add mem usage#651
Merged
Merged
Conversation
jorgeantonio21
requested changes
Jun 2, 2025
Contributor
jorgeantonio21
left a comment
There was a problem hiding this comment.
Looking good, but I believe some parts of the logic needs to be refactored
| const DEFAULT_MAX_TOKENS: u64 = 8_192; | ||
|
|
||
| /// The ceiling for memory usage, above which the service will not accept new requests | ||
| const MEMORY_USAGE_CEILING: f64 = 0.9; |
Contributor
There was a problem hiding this comment.
I reckon these values should be set in the config.toml, as we will most likely need to tweak them
jorgeantonio21
approved these changes
Jun 2, 2025
jorgeantonio21
added a commit
that referenced
this pull request
Jun 5, 2025
* fix: ensure client errors are correctly tracked (#635) * fix: ensure client errors are correctly tracked * chore: update error tracking * chore: adjust clippy * chore: grammatical error * ci: use stable toolchain (#645) * ci: use stable toolchain * chore: fix clippy issues * revert to use prometheus for queued requests (#646) * revert to use prometheus for queued requests * add start metrics collector * update logs * feat: turn on too many requests for a period of time (#647) * feat: add request running cap (#649) * feat: add request running cap * fix clippy --------- Co-authored-by: Jorge Antonio <matroid@outlook.com> * refactor num running requests for prometheus check * logs * handle deadlock for too many requests timeout trigger check (#650) * feat: add mem usage (#651) * feat: add memusage to get_metrics * add lower threshold for disabling the flag * fix clippy * address 2 comments * add values to config * fix * fix tests * fix name * feat: update sui dependencies (#654) * resolve compilation issues * ci: add caching strategy for ci * ci: optimize coverage job * ci: adjust coverage job * ci: update deny action * ci: use grcov * ci: use stable toolchain * ci: only run tests once * ci: move coverage to test file * ci: use --codecov flag & stable toolchain * ci: discard p2p tester --------- Co-authored-by: chad <chad.nehemiah94@gmail.com> * feat: add max number of queued requests configuration and update request handling (#656) * fix: correct deadlock in `check_if_too_many_requests` (#658) * correct deadlock in check_if_too_many_requests method * resolve tests * add changes * add changes * continue improving logic * add changes * fix: normalize model strings to lowercase in request handlers (#661) * fix: normalize model strings to lowercase in request handlers * fix test * fix --------- Co-authored-by: Chad Nehemiah <chad.nehemiah94@gmail.com> Co-authored-by: Martin Stefcek <35243812+Cifko@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add memory usage cap at 0.9