Skip to content

Disable prefix cache for kimi vllm configs#926

Open
Oseltamivir wants to merge 4 commits intomainfrom
disable-prefix
Open

Disable prefix cache for kimi vllm configs#926
Oseltamivir wants to merge 4 commits intomainfrom
disable-prefix

Conversation

@Oseltamivir
Copy link
Collaborator

@Oseltamivir Oseltamivir commented Mar 23, 2026

Disable prefix caching for Kimi K2.5 vLLM benchmarks

Adds --no-enable-prefix-caching to all Kimi K2.5 vLLM benchmark scripts. Prefix caching is unnecessary for benchmarks using random datasets (cache hits would be zero, and the overhead is wasteful).

Affected configs:

  • kimik2.5-int4-mi325x-vllm
  • kimik2.5-int4-mi355x-vllm
  • kimik2.5-int4-h200-vllm
  • kimik2.5-fp4-mi355x-vllm
  • kimik2.5-fp4-b200-vllm

@github-actions
Copy link
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

Copy link
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — straightforward addition of --no-enable-prefix-caching across all Kimi K2.5 vLLM benchmark scripts.

Extended reasoning...

Overview

This PR adds a single flag --no-enable-prefix-caching to 6 benchmark shell scripts (kimik2.5_fp4_b200.sh, kimik2.5_fp4_mi355x.sh, kimik2.5_int4_b200.sh, kimik2.5_int4_h200.sh, kimik2.5_int4_mi325x.sh, kimik2.5_int4_mi355x.sh) and a corresponding perf-changelog.yaml entry documenting the change.

Security Risks

No security concerns. These are benchmark configuration scripts with no auth, crypto, or permission-sensitive logic.

Level of Scrutiny

This is a minimal, low-risk config change. The flag disables prefix caching, which is the correct setting for benchmarks using random datasets (where prefix cache hits would be artificially zero anyway and the overhead of the cache is wasteful). The change is mechanically consistent across all 6 affected scripts.

Other Factors

No bugs were found by the automated system. The change is self-contained and does not touch any production logic, only benchmark scripts. The changelog entry is accurate and complete.

@cquil11
Copy link
Collaborator

cquil11 commented Mar 24, 2026

@claude add pr description based on perf changelog

@Klaud-Cold
Copy link
Collaborator

Klaud-Cold commented Mar 24, 2026

Claude finished @cquil11's task in 40s —— View job


Adding PR description from perf-changelog

  • Read perf-changelog.yaml for relevant entry
  • Update PR description

PR description updated based on the perf-changelog entry. Added a summary of the change (disabling prefix caching for random-dataset benchmarks) and listed all 5 affected config keys.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants