Conversation
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you |
There was a problem hiding this comment.
LGTM — straightforward addition of --no-enable-prefix-caching across all Kimi K2.5 vLLM benchmark scripts.
Extended reasoning...
Overview
This PR adds a single flag --no-enable-prefix-caching to 6 benchmark shell scripts (kimik2.5_fp4_b200.sh, kimik2.5_fp4_mi355x.sh, kimik2.5_int4_b200.sh, kimik2.5_int4_h200.sh, kimik2.5_int4_mi325x.sh, kimik2.5_int4_mi355x.sh) and a corresponding perf-changelog.yaml entry documenting the change.
Security Risks
No security concerns. These are benchmark configuration scripts with no auth, crypto, or permission-sensitive logic.
Level of Scrutiny
This is a minimal, low-risk config change. The flag disables prefix caching, which is the correct setting for benchmarks using random datasets (where prefix cache hits would be artificially zero anyway and the overhead of the cache is wasteful). The change is mechanically consistent across all 6 affected scripts.
Other Factors
No bugs were found by the automated system. The change is self-contained and does not touch any production logic, only benchmark scripts. The changelog entry is accurate and complete.
4548199 to
097040e
Compare
|
@claude add pr description based on perf changelog |
|
Claude finished @cquil11's task in 40s —— View job Adding PR description from perf-changelog
PR description updated based on the perf-changelog entry. Added a summary of the change (disabling prefix caching for random-dataset benchmarks) and listed all 5 affected config keys. |
Disable prefix caching for Kimi K2.5 vLLM benchmarks
Adds
--no-enable-prefix-cachingto all Kimi K2.5 vLLM benchmark scripts. Prefix caching is unnecessary for benchmarks using random datasets (cache hits would be zero, and the overhead is wasteful).Affected configs:
kimik2.5-int4-mi325x-vllmkimik2.5-int4-mi355x-vllmkimik2.5-int4-h200-vllmkimik2.5-fp4-mi355x-vllmkimik2.5-fp4-b200-vllm