Skip to content

fix: add --exclusive to MI300X SLURM salloc for accurate benchmarks#930

Closed
cquil11 wants to merge 2 commits intomainfrom
fix/mi300x-exclusive
Closed

fix: add --exclusive to MI300X SLURM salloc for accurate benchmarks#930
cquil11 wants to merge 2 commits intomainfrom
fix/mi300x-exclusive

Conversation

@cquil11
Copy link
Collaborator

@cquil11 cquil11 commented Mar 23, 2026

Summary

  • Add --exclusive flag to salloc in runners/launch_mi300x-amds.sh to prevent node sharing during benchmarks
  • Only non-TP8 configs are affected (TP8 already uses all GPUs); perf-changelog updated for gptoss-fp4-mi300x-vllm and minimaxm2.5-fp8-mi300x-vllm
  • NVIDIA runners already use --exclusive; this brings MI300X to parity

Test plan

  • Run a non-TP8 MI300X benchmark (e.g. gptoss-fp4-mi300x-vllm TP4) and verify results are consistent with exclusive node access

🤖 Generated with Claude Code

Without --exclusive, SLURM can co-schedule other jobs on the same node,
causing memory bandwidth, CPU, and NUMA contention that degrades
benchmark results — especially for non-TP8 configs where not all GPUs
are requested.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cquil11 cquil11 requested a review from a team March 23, 2026 14:56
@github-actions
Copy link
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

@cquil11
Copy link
Collaborator Author

cquil11 commented Mar 25, 2026

note needed
oversubscriptin is disabled by default

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

1 participant