Skip to content

Prefetch optimize, adding three prefetch strategies#163

Open
Luis-xu wants to merge 1 commit into
taco-project:feat/layerwise_rebasefrom
Luis-xu:prefetch_optim
Open

Prefetch optimize, adding three prefetch strategies#163
Luis-xu wants to merge 1 commit into
taco-project:feat/layerwise_rebasefrom
Luis-xu:prefetch_optim

Conversation

@Luis-xu
Copy link
Copy Markdown
Collaborator

@Luis-xu Luis-xu commented May 12, 2026

The prefetch functionality has been preliminarily implemented, supporting two-level flow control and three prefetch strategies.

two-level flow control:

  1. If the number of prefetch tokens is below the threshold, do not perform prefetching.
  2. If the total number of currently executing prefetch tokens exceeds a certain ratio (FLEXKV_PREFETCH_CAPACITY_RATIO) of the current CPU block total, do not perform prefetching.

Three prefetch strategies:

  • wait_complete: wait until the prefetch operation of current request complete fully.
  • best_effort: perform as much prefetching as possible during the current prefetch cycle.
  • timeout: perform as much prefetching as possible based on the current request’s timeout duration.

@linhu-nv linhu-nv requested a review from zhuofan1123 May 14, 2026 01:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant